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Staphylococcus aureus Polynucleotides and Sequences 

FIELD OF THE INVENTION 

[0001] This application is a divisional of U.S. Application No. 08/781,986, filed 
January 3, 1997, which claims benefit under 35 U.S.C. section 1 19(e) of U.S. 
Provisional Application Serial No. 60/009,861, filed January 5, 1996. 
[0002] The present invention relates to the field of molecular biology. In 
particular, it relates to, among other things, nucleotide sequences of 
Staphylococcus aureus, contigs, ORFs, fragments, probes, primers and related 
polynucleotides thereof, peptides and polypeptides encoded by the sequences, and 
uses of the polynucleotides and sequences thereof, such as in fermentation, 
polypeptide production, assays and pharmaceutical development, among others. 

BACKGROUND OF THE INVENTION 

[0003] The genus Staphylococcus includes at least 20 distinct species. (For a 
review see Novick, R. P., The Staphylococcus as a Molecular Genetic System, 
Chapter 1, pgs. 1-37 in MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI 
R. Novick, Ed., VCH Publishers, New York (1990)). Species differ from one 
another by 80% or more, by hybridization kinetics, whereas strains within a 
species are at least 90% identical by the same measure. 

[0004] The species Staphylococcus aureus, a gram-positive, facultatively aerobic, 
clump-forming cocci, is among the most important etiological agents of bacterial 
infection in humans, as discussed briefly below. 

[0005] Human Health and S. Aureus 

[0006] Staphylococcus aureus is a ubiquitous pathogen. (See, for instance, Mims 
et ai, MEDICAL MICROBIOLOGY, Mosby-Year Book Europe Limited, 
London, UK (1993)). It is an etiological agent of a variety of conditions, ranging 
in severity from mild to fatal. A few of the more common conditions caused by 
S. aureus infection are burns, cellulitis, eyelid infections, food poisoning, joint 
infections, neonatal conjunctivitis,osteomyelitis, skin infections, surgical wound 
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infection, scalded skin syndrome and toxic shock syndrome, some of which are 
described further below. 

[0007] Burns 

[0008] Burn wounds generally are sterile initially. However, they generally 
compromise physical and immune barriers to infection, cause loss of fluid and 
electrolytes and result in local or general physiological dysfunction. After 
cooling, contact with viable bacteria results in mixed colonization at the injury 
site. Infection may be restricted to the non-viable debris on the burn surface 
("eschar"), it may progress into full skin infection and invade viable tissue below 
the eschar and it may reach below the skin, enter the lymphatic and blood 
circulation and develop into septicaemia. S. aureus is among the most important 
pathogens typically found in burn wound infections. It can destroy granulation 
tissue and produce severe septicaemia. 

[0009] Cellulitis 

[0010] Cellulitis, an acute infection of the skin that expands from a typically 
superficial origin to spread below the cutaneous layer, most commonly is caused 
by S. aureus in conjunction with & pyrogenes. Cellulitis can lead to systemic 
infection. In fact, cellulitis can be one aspect of synergistic bacterial gangrene. 
This condition typically is caused by a mixture of S. aureus and microaerophilic 
streptococci. It causes necrosis and treatment is limited to excision of the 
necrotic tissue. The condition often is fatal. 

[0011] Eyelid infections 

[0012] S. aureus is the cause of styes and of sticky eye" in neonates, among other 
eye infections. Typically such infections are limited to the surface of the eye, and 
may occasionally penetrate the surface with more severe consequences. 

[0013] Food poisoning 

[0014] Some strains of S. aureus produce one or more of five serologically 
distinct, heat and acid stable enterotoxins that are not destroyed by digestive 
process of the stomach and small intestine (enterotoxins A-E). Ingestion of the 
toxin, in sufficient quantities, typically results in severe vomiting, but not 
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diarrhoea. The effect does not require viable bacteria. Although the toxins are 
known, their mechanism of action is not understood. 

[0015] Joint infections 

[0016] S. aureus infects bone joints causing diseases such osteomyelitis. 
[0017] Osteomyelitis 

[0018] S. aureus is the most common causative agent of haematogenous 
osteomyelitis. The disease tends to occur in children and adolescents more than 
adults and it is associated with non-penetrating injuries to bones. Infection 
typically occurs in the long end of growing bone, hence its occurrence in 
physically immature populations. Most often, infection is localized in the vicinity 
of sprouting capillary loops adjacent to epiphysial growth plates in the end of 
long, growing bones. 

[0019] Skin infections 

[0020] S. aureus is the most common pathogen of such minor skin infections as 
abscesses and boils. Such infections often are resolved by normal host response 
mechanisms, but they also can develop into severe internal infections. Recurrent 
infections of the nasal passages plague nasal carriers of S. aureus. 

[0021] Surgical Wound Infections 

[0022] Surgical wounds often penetrate far into the body. Infection of such 
wound thus poses a grave risk to the patient. S. aureus is the most important 
causative agent of infections in surgical wounds. S. aureus is unusually adept at 
invading surgical wounds; sutured wounds can be infected by far fewer S. aureus 
cells then are necessary to cause infection in normal skin. Invasion of surgical 
wound can lead to severe S. aureus septicaemia. Invasion of the blood stream by 
S. aureus can lead to seeding and infection of internal organs, particularly heart 
valves and bone, causing systemic diseases, such as endocarditis and 
osteomyelitis. 

[0023] Scalded Skin Syndrome 

[0024] S. aureus is responsible for "scalded skin syndrome" (also called toxic 
epidermal necrosis, Ritter's disease and Lyell's disease). This diseases occurs in 
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older children, typically in outbreaks caused by flowering of S. aureus strains 
produce exfoliation(also called scalded skin syndrome toxin). Although the 
bacteria initially may infect only a minor lesion, the toxin destroys intercellular 
connections, spreads epidermal layers and allows the infection to penetrate the 
outer layer of the skin, producing the desquamation that typifies the diseases. 
Shedding of the outer layer of skin generally reveals normal skin below, but fluid 
lost in the process can produce severe injury in young children if it is not treated 
properly. 

[0025] Toxic Shock Syndrome 

[0026] Toxic shock syndrome is caused by strains of S. aureus that produce the 
so-called toxic shock syndrome toxin. The disease can be caused by S. aureus 
infection at any site, but it is too often erroneously viewed exclusively as a 
disease solely of women who use tampons. The disease involves toxaemia and 
septicaemia, and can be fatal. 

[0027] Nocosomial Infections 

[0028] In the 1984 National Nocosomial Infection Surveillance Study ("NNIS M ) 
S. aureus was the most prevalent agent of surgical wound infections in many 
hospital services, including medicine, surgery, obstetrics, pediatrics and 
newborns. 

[0029] Resistance to drugs of S. aureus strains 

[0030] Prior to the introduction of penicillin the prognosis for patients seriously 
infected with S. aureus was unfavorable. Following the introduction of penicillin 
in the early 1 940s even the worst S. aureus infections generally could be treated 
successfully. The emergence of penicillin-resistant strains of S. aureus did not 
take long, however. Most strains of S. aureus encountered in hospital infections 
today do not respond to penicillin; although, fortunately, this is not the case for S. 
aureus encountered in community infections. 

[0031] It is well known now that penicillin-resistant strains of & aureus produce 
a lactamase which converts penicillin to pencillinoic acid, and thereby destroys 
antibiotic activity. Furthermore, the lactamase gene often is propagated 
episomally, typically on a plasmid, and often is only one of several genes on an 
episomal element that, together, confer multidrug resistance. 
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[0032] Methicillins, introduced in the 1960s, largely overcame the problem of 
penicillin resistance in S. aureus. These compounds conserve the portions of 
penicillin responsible for antibiotic activity and modify or alter other portions that 
make penicillin a good substrate for inactivating lactamases. However, 
methicillin resistance has emerged in S. aureus, along with resistance to many 
other antibiotics effective against this organism, including aminoglycosides, 
tetracycline, chloramphenicol, macrolides and lincosamides. In fact, methicillin- 
resistant strains of S. aureus generally are multiply drug resistant. 
[0033] The molecular genetics of most types of drug resistance in S. aureus has 
been elucidated (See Lyon et aL 9 Microbiology Reviews 5k 88-134 (1987)). 
Generally, resistance is mediated by plasmids, as noted above regarding penicillin 
resistance; however, several stable forms of drug resistance have been observed 
that apparently involve integration of a resistance element into the S. aureus 
genome itself. 

[0034] Thus far each new antibiotic gives rise to resistance strains, stains emerge 
that are resistance to multiple drugs and increasingly persistent forms of 
resistance begin to emerge. Drug resistance of S. aureus infections already poses 
significant treatment difficulties, which are likely to get much worse unless new 
therapeutic agents are developed. 

[0035] Molecular Genetics of Staphylococcus Aureus 

[0036] Despite its importance in, among other things, human disease, relatively 
little is known about the genome of this organism. 

[0037] Most genetic studies of S. aureus have been carried out using the the 
strain NCTC8325, which contains prophages psil 1, psil2 and psil3, and the UV- 
cured derivative of this strain, 8325-4 (also referred to as RN450), which is free 
of the prophages. 

[0038] These studies revealed that the S. aureus genome, like that of other 
staphylococci, consists of one circular, covalently closed, double-stranded DNA 
and a collection of so-called variable accessory genetic elements, such as 
prophages, plasmids, transposons and the like. Physical characterization of the 
genome has not been carried out in any detail. Pattee et al. published a low 
resolution and incomplete genetic and physical map of the chromosome of S. 
aureus strain NCTC 8325. (Pattee et al Genetic and Physical Mapping of 
Chromosome of Staphylococcus aureus NCTC 8325, Chapter 11, pgs. 163-169 
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in.MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI R.P. Novick, Ed., 
VCH Publishers, New York, (1990) The genetic map largely was produced by 
mapping insertions of Tn551 and Tn4001 5 which, respectively, confer 
erythromycin and gentamicin resistance, and by analysis of Smal-digested DNA 
by Pulsed Field Gel Electrophoresis ("PFGE"). 

[0039] The map was of low resolution; even estimating the physical size of the 
genome was difficult, according to the investigators. The size of the largest Smal 
chromosome fragment, for instance, was too large for accurate sizing by PFGE. 
To estimate its size, additional restriction sites had to be introduced into the 
chromosome using a transposon containing a Smal recognition sequence. 
[0040] In sum, most physical characteristics and almost all of the genes of 
Staphylococcus aureus are unknown. Among the few genes that have been 
identified, most have not been physically mapped or characterized in detail. Only 
a very few genes of this organism have been sequenced. (See, for instance 
Thornsberry, J. , Antimicrobial Chemotherapy 21 Suppl C : 9-16 (1988), current 
versions of GENBANK and other nucleic acid databases, and references that 
relate to the genome of S. aureus such as those set out elsewhere herein.) 
[0041] It is clear that the etiology of diseases mediated or exacerbated by & 
aureus infection involves the programmed expression of S. aureus genes, and that 
characterizing the genes and their patterns of expression would add dramatically 
to our understanding of the organism and its host interactions. Knowledge of S. 
aureus genes and genomic organization would dramatically improve 
understanding of disease etiology and lead to improved and new ways of 
preventing, ameliorating, arresting and reversing diseases. Moreover, 
characterized genes and genomic fragments of S. aureus would provide reagents 
for, among other things, detecting, characterizing and controlling S. aureus 
infections. There is a need therefore to characterize the genome of S. aureus and 
for polynucleotides and sequences of this organism. 



SUMMARY OF THE INVENTION 



[0042] The present invention is based on the sequencing of fragments of the 
Staphylococcus aureus genome. The primary nucleotide sequences which were 
generated are provided in SEQ ID NOS: 1-5,191. 
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[0043] The present invention provides the nucleotide sequence of several 
thousand contigs of the Staphylococcus aureus genome, which are listed in tables 
below and set out in the Sequence Listing submitted herewith, and representative 
fragments thereof, in a form which can be readily used, analyzed, and interpreted 
by a skilled artisan. In one embodiment, the present invention is provided as 
contiguous strings of primary sequence information corresponding to the 
nucleotide sequences depicted in SEQ ID NOSrl-5,191. 

[0044] The present invention further provides nucleotide sequences which are at 
least 95% identical to the nucleotide sequences of SEQ ID NOS: 1-5,191. 
[0045] The nucleotide sequence of SEQ ID NOS: 1-5,191, a representative 
fragment thereof, or a nucleotide sequence which is at least 95% identical to the 
nucleotide sequence of SEQ ID NOSrl-5,191 may be provided in a variety of 
mediums to facilitate its use. In one application of this embodiment, the 
sequences of the present invention are recorded on computer readable media. 
Such media includes, but is not limited to:magnetic storage media, such as floppy 
discs, hard disc storage medium, and magnetic tape; optical storage media such as 
CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these 
categories such as magnetic/optical storage media. 

[0046] The present invention further provides systems, particularly computer- 
based systems which contain the sequence information herein described stored in 
a data storage means. Such systems are designed to identify commercially 
important fragments of the Staphylococcus aureus genome. 
[0047] Another embodiment of the present invention is directed to fragments of 
the Staphylococcus aureus genome having particular structural or functional 
attributes. Such fragments of the Staphylococcus aureus genome of the present 
invention include, but are not limited to, fragments which encode peptides, 
hereinafter referred to as open reading frames or ORFs," fragments which 
modulate the expression of an operably linked ORF, hereinafter referred to as 
expression modulating fragments or EMFs," and fragments which can be used to 
diagnose the presence of Staphylococcus aureus in a sample, hereinafter referred^ 
to as diagnostic fragments or "DFs." 

[0048] Each of the ORFs in fragments of the Staphylococcus aureus genome 
disclosed in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in 
numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining 
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the presence of a specific microbe in a sample, to selectively control gene 
expression in a host and in the production of polypeptides, such as polypeptides 
encoded by ORFs of the present invention, particular those polypeptides that have 
a pharmacological activity. 

[0049] The present invention further includes recombinant constructs comprising 
one or more fragments of the Staphylococcus aureus genome of the present 
invention. The recombinant constructs of the present invention comprise vectors, 
such as a plasmid or viral vector, into which a fragment of the Staphylococcus 
aureus has been inserted. 

[0050] The present invention further provides host cells containing any of the 
isolated fragments of the Staphylococcus aureus genome of the present invention. 
The host cells can be a higher eukaryotic host cell, such as a mammalian cell, a 
lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a bacterial 
cell. 

[0051] The present invention is further directed to isolated polypeptides and 
proteins encoded by ORFs of the present invention. A variety of methods, well 
known to those of skill in the art, routinely may be utilized to obtain any of the 
polypeptides and proteins of the present invention. For instance, polypeptides 
and proteins of the present invention having relatively short, simple amino acid 
sequences readily can be synthesized using commercially available automated 
peptide synthesizers. Polypeptides and proteins of the present invention also may 
be purified from bacterial cells which naturally produce the protein. Yet another 
alternative is to purify polypeptide and proteins of the present invention can from 
cells which have been altered to express them. 

[0052] The invention further provides polypeptides comprising Staphylococcus 
aureus epitopes and vaccine compositions comprising such polypeptides. Also 
provided are methods for vacciniating an individual against Staphylococcus 
aureus infection. 

[0053] The invention further provides methods of obtaining homologs of the 
fragments of the Staphylococcus aureus genome of the present invention and 
homologs of the proteins encoded by the ORFs of the present invention. 
Specifically, by using the nucleotide and amino acid sequences disclosed herein 
as a probe or as primers, and techniques such as PCR cloning and colony/plaque 
hybridization, one skilled in the art can obtain homologs. 
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[0054] The invention further provides antibodies which selectively bind 
polypeptides and proteins of the present invention. Such antibodies include both 
monoclonal and polyclonal antibodies. 

[0055] The invention further provides hybridomas which produce the above- 
described antibodies. A hybridoma is an immortalized cell line which is capable 
of secreting a specific monoclonal antibody. 

[0056] The present invention further provides methods of identifying test samples 
derived from cells which express one of the ORFs of the present invention, or a 
homolog thereof. Such methods comprise incubating a test sample with one or 
more of the antibodies of the present invention, or one or more of the Dfs or 
antigens of the present invention, under conditions which allow a skilled artisan 
to determine if the sample contains the ORF or product produced therefrom. 
[0057] In another embodiment of the present invention, kits are provided which 
contain the necessary reagents to carry out the above-described assays. 
[0058] Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the antibodies, antigens, or one of the DFs of the present 
invention; and (b) one or more other containers comprising one or more of the 
following: wash reagents, reagents capable of detecting presence of bound 
antibodies, antigens or hybridized DFs. 

[0059] Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents capable of binding 
to a polypeptide or protein encoded by one of the ORFs of the present invention. 
Specifically, such agents include, as further described below, antibodies, peptides, 
carbohydrates, pharmaceutical agents and the like. Such methods comprise steps 
of: (a)contacting an agent with an isolated protein encoded by one of the ORFs of 
the present invention; and (b)determining whether the agent binds to said protein. 
[0060] The present genomic sequences of Staphylococcus aureus will be of great 
value to all laboratories working with this organism and for a variety of 
commercial purposes. Many fragments of the Staphylococcus aureus genome 
will be immediately identified by similarity searches against GenBank or protein 
databases and will be of immediate value to Staphylococcus aureus researchers 
and for immediate commercial value for the production of proteins or to control 
gene expression. 
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[0061] The methodology and technology for elucidating extensive genomic 
sequences of bacterial and other genomes has and will greatly enhance the ability 
to analyze and understand chromosomal organization. In particular, sequenced 
contigs and genomes will provide the models for developing tools for the analysis 
of chromosome structure and function, including the ability to identify genes 
within large segments of genomic DNA, the structure, position, and spacing of 
regulatory elements, the identification of genes with potential industrial 
applications, and the ability to do comparative genomic and molecular phylogeny. 



DESCRIPTION OF THE FIGURES 



[0062] FIGURE 1 is a block diagram of a computer system ( 1 02) that can be 
used to implement computer-based systems of present invention. 
[0063] FIGURE 2 is a schematic diagram depicting the data flow and computer 
programs used to collect, assemble, edit and annotate the contigs of the 
Staphylococcus aureus genome of the present invention. Both Macintosh and 
Unix platforms are used to handle the AB 373 and 377 sequence data files, 
largely as described in Kerlavage et aL, Proceedings of the Twenty -Sixth Annual 
Hawaii International Conference on System Sciences , 585, IEEE Computer 
Society Press, Washington D.C. (1993). Factura (AB) is a Macintosh program 
designed for automatic vector sequence removal and end-trimming of sequence 
files. The program Loadis runs on a Macintosh platform and parses the feature 
data extracted from the sequence files by Factura to the Unix based 
Staphylococcus aureus relational database. Assembly of contigs (and whole 
genome sequences) is accomplished by retrieving a specific set of sequence files 
and their associated features using extrseq, a Unix utility for retrieving sequences 
from an SQL database. The resulting sequence file is processed by seq_filter to 
trim portions of the sequences with more than 2% ambiguous nucleotides. The 
sequence files were assembled using TIGR Assembler, an assembly engine 
designed at The Institute for Genomic Research ( TIGR") for rapid and accurate 
assembly of thousands of sequence fragments. The collection of contigs 
generated by the assembly step is loaded into the database with the lassie 
program. Identification of open reading frames (ORFs) is accomplished by 
processing contigs with zorf. The ORFs are searched against S. aureus sequences 
from Genbank and against all protein sequences using the BLASTN and BLASTP 
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programs, described in Altschul et al. 9 J. Mol Biol 215: 403-410 (1990)). 
Results of the ORF determination and similarity searching steps were loaded into 
the database. As described below, some results of the determination and the 
searches are set out in Tables 1 -3 . . 



DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 



[0064] The present invention is based on the sequencing of fragments of the 
Staphylococcus aureus genome and analysis of the sequences. The primary 
nucleotide sequences generated by sequencing the fragments are provided in SEQ 
ID NOS: 1-5,1 91. (As used herein, the "primary sequence" refers to the 
nucleotide sequence represented by the IUPAC nomenclature system.) 
[0065] In addition to the aforementioned Staphylococcus aureus polynucleotide 
and polynucleotide sequences, the present invention provides the nucleotide 
sequences of SEQ ID NOS: 1-5,191, or representative fragments thereof, in a form 
which can be readily used, analyzed, and interpreted by a skilled artisan. 
[0066] As used herein, a "representative fragment of the nucleotide sequence 
depicted in SEQ ID NOS:l-5,191" refers to any portion of the SEQ ID NOS:l- 
5,191 which is not presently represented within a publicly available database. 
Preferred representative fragments of the present invention are Staphylococcus 
aureus open reading frames ( ORFs"), expression modulating fragment ( EMFs") 
and fragments which can be used to diagnose the presence of Staphylococcus 
aureus in sample ("DFs"). A non-limiting identification of preferred 
representative fragments is provided in Tables 1-3. 

[0067] As discussed in detail below, the information provided in SEQ ID NOS:l- 
5,191 and in Tables 1-3 together with routine cloning, synthesis, sequencing and 
assay methods will enable those skilled in the art to clone and sequence all 
"representative fragments" of interest, including open reading frames encoding a 
large variety of Staphylococcus aureus proteins. 

[0068] While the presently disclosed sequences of SEQ ID NOS: 1-5,191 are 
highly accurate, sequencing techniques are not perfect and, in relatively rare 
instances, further investigation of a fragment or sequence of the invention may 
reveal a nucleotide sequence error present in a nucleotide sequence disclosed in 
SEQ ID NOS:l-5,191 . However, once the present invention is made available 
(i.e., once the information in SEQ ID NOS: 1-5, 191 and Tables 1-3 has been made 
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available), resolving a rare sequencing error in SEQ ID NOS:l-5 3 191 will be well 
within the skill of the art. The present disclosure makes available sufficient 
sequence information to allow any of the described contigs or portions thereof to 
be obtained readily by straightforward application of routine techniques. Further 
sequencing of such polynucleotide may proceed in like manner using manual and 
automated sequencing methods which are employed ubiquitous in the art. 
Nucleotide sequence editing software is publicly available. For example, Applied 
Biosystem f s (AB) AutoAssembler can be used as an aid during visual inspection 
of nucleotide sequences. By employing such routine techniques potential errors 
readily may be identified and the correct sequence then may be ascertained by 
targeting further sequencing effort, also of a routine nature, to the region 
containing the potential error. 

[0069] Even if all of the very rare sequencing errors in SEQ ID NOS:l-5,191 
were corrected, the resulting nucleotide sequences would still be at least 95% 
identical, nearly all would be at least 99% identical, and the great majority would 
be at least 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-5,191 . 
[0070] As discussed elsewhere hererin, polynucleotides of the present invention 
readily may be obtained by routine application of well known and standard 
procedures for cloning and sequencing DNA. Detailed methods for obtaining 
libraries and for sequencing are provided below, for instance. A wide variety of 
Staphylococcus aureus strains that can be used to prepare S aureus genomic DNA 
for cloning and for obtaining polynucleotides of the present invention are 
available to the public from recognized depository institutions, such as the 
American Type Culture Collection (ATCC"). 

[0071] The nucleotide sequences of the genomes from different strains of 
Staphylococcus aureus differ somewhat. However, the nucleotide sequences of 
the genomes of all Staphylococcus aureus strains will be at least 95% identical, in 
corresponding part, to the nucleotide sequences provided in SEQ ID NOS:l- 
5,191. Nearly all will be at least 99% identical and the great majority will be 
99.9% identical. 

[0072] Thus, the present invention further provides nucleotide sequences which 
are at least 95%, preferably 99% and most preferably 99.9% identical to the 
nucleotide sequences of SEQ ID NOS:l-5,191, in a form which can be readily 
used, analyzed and interpreted by the skilled artisan. 
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[0073] Methods for determining whether a nucleotide sequence is at least 95%, at 
least 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID 
NOS: 1-5,191 are routine and readily available to the skilled artisan. For example, 
the well known fasta algorithm described in Pearson and Lipman, Proc. Natl. 
Acad. Sci. USA 85: 2444 (1988) can be used to generate the percent identity of 
nucleotide sequences. The BLASTN program also can be used to generate an 
identity score of polynucleotides compared to one another. 

[0074] COMPUTER RELATED EMBODIMENTS 
[0075] The nucleotide sequences provided in SEQ ID NOS:l-5,191, a 
representative fragment thereof, or a nucleotide sequence at least 95%, preferably 
at least 99% and most preferably at least 99.9% identical to a polynucleotide 

sequence of SEQ ID NOS:l-5,191 may be "provided" in a variety of mediums to 

> 

facilitate use thereof. As used herein, Oprovided" refers to a manufacture, other 
than an isolated nucleic acid molecule, which contains a nucleotide sequence of 
the present invention; i.e., a nucleotide sequence provided in SEQ ID NOS:l- 
5,191 , a representative fragment thereof, or a nucleotide sequence at least 95%, 
preferably at least 99% and most preferably at least 99.9% identical to a 
polynucleotide of SEQ ID NOS : 1-5,191. Such a manufacture provides a large 
portion of the Staphylococcus aureus genome and parts thereof (e.g., a 
Staphylococcus aureus open reading frame (ORF)) in a form which allows a 
skilled artisan to examine the manufacture using means not directly applicable to 
examining the Staphylococcus aureus genome or a subset thereof as it exists in 
nature or in purified form. 

[0076] In one application of this embodiment, a nucleotide sequence of the 
present invention can be recorded on computer readable media. As used herein, 
"computer readable media" refers to any medium which can be read and accessed 
directly by a computer. Such media include, but are not limited to: magnetic 
storage media, such as floppy discs, hard disc storage medium, and magnetic 
tape; optical storage media such as CD- ROM; electrical storage media such as 
RAM and ROM; and hybrids of these categories, such as magnetic/optical storage 
media. A skilled artisan can readily appreciate how any of the presently known 
computer readable mediums can be used to create a manufacture comprising 
computer readable medium having recorded thereon a nucleotide sequence of the 
present invention. Likewise, it will be clear to those of skill how additional 
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computer readable media that may be developed also can be used to create 
analogous manufactures having recorded thereon a nucleotide sequence of the 
present invention. 

[0077] As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the 
presently know methods for recording information on computer readable medium 
to generate manufactures comprising the nucleotide sequence information of the 
present invention. 

[0078] A variety of data storage structures are available to a skilled artisan for 
creating a computer readable medium having recorded thereon a nucleotide 
sequence of the present invention. The choice of the data storage structure will 
generally be based on the means chosen to access the stored information. In 
addition, a variety of data processor programs and formats can be used to store 
the nucleotide sequence information of the present invention on computer 
readable medium. The sequence information can be represented in a word 
processing text file, formatted in commercially- available software such as 
WordPerfect and MicroSoft Word, or represented in the form of an ASCII file, 
stored in a database application, such as DB2, Sybase, Oracle, or the like. A 
skilled artisan can readily adapt any number of data-processor structuring formats 
(e.g., text file or database) in order to obtain computer readable medium having 
recorded thereon the nucleotide sequence information of the present invention. 
[0079] Computer software is publicly available which allows a skilled artisan to 
access sequence information provided in a computer readable medium. Thus, by 
providing in computer readable form the nucleotide sequences of SEQ ID NOS:l- 
5,191, a representative fragment thereof, or a nucleotide sequence at least 95%, 
preferably at least 99% and most preferably at least 99.9% identical to a sequence 
of SEQ ID NOS: 1-5,191 the present invention enables the skilled artisan 
routinely to access the provided sequence information for a wide variety of 
purposes. 

[0080] The examples which follow demonstrate how software which implements 
the BLAST (Altschul et aL, J. Mol. Biol. 215:403-410 (1990)) and BLAZE 
(Brutlag et aL, Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase 
system was used to identify open reading frames (ORFs) within the 
Staphylococcus aureus genome which contain homology to ORFs or proteins 
from both Staphylococcus aureus and from other organisms. Among the ORFs 
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discussed herein are protein encoding fragments of the Staphylococcus aureus 
genome useful in producing commercially important proteins, such as enzymes 
used in fermentation reactions and in the production of commercially useful 
metabolites. 

[0081] The present invention further provides systems, particularly computer- 
based systems, which contain the sequence information described herein. Such 
systems are designed to identify, among other things, commercially important 
fragments of the Staphylococcus aureus genome. 

[0082] As used herein, "a computer-based system" refers to the hardware means, 
software means, and data storage means used to analyze the nucleotide sequence 
information of the present invention. The minimum hardware means of the 
computer-based systems of the present invention comprises a central processing 
unit (CPU), input means, output means, and data storage means. A skilled artisan 
can readily appreciate that any one of the currently available computer-based 
system are suitable for use in the present invention. 

[0083] As stated above, the computer-based systems of the present invention 
comprise a data storage means having stored therein a nucleotide sequence of the 
present invention and the necessary hardware means and software means for 
supporting and implementing a search means. 

[0084] As used herein, "data storage means" refers to memory which can store 
nucleotide sequence information of the present invention, or a memory access 
means which can access manufactures having recorded thereon the nucleotide 
sequence information of the present invention. 

[0085] As used herein, "search means" refers to one or more programs which are 
implemented on the computer- based system to compare a target sequence or 
target structural motif with the sequence information stored within the data 
storage means. Search means are used to identify fragments or regions of the 
present genomic sequences which match a particular target sequence or target 
motif. A variety of known algorithms are disclosed publicly and a variety of 
commercially available software for conducting search means are and can be used 
in the computer-based systems of the present invention. Examples of such 
software includes, but is not limited to, MacPattern (EMBL), BLASTN and 
BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the 
available algorithms or implementing software packages for conducting 
homology searches can be adapted for use in the present computer-based systems. 
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[0086] As used herein, a "target sequence" can be any DNA or amino acid 
sequence of six or more nucleotides or two or more amino acids. A skilled 
artisan can readily recognize that the longer a target sequence is, the less likely a 
target sequence will be present as a random occurrence in the database. The most 
preferred sequence length of a target sequence is from about 1 0 to 100 amino 
acids or from about 30 to 300 nucleotide residues. However, it is well recognized 
that searches for commercially important fragments, such as sequence fragments 
involved in gene expression and protein processing, may be of shorter length. 
[0087] As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the 
sequence(s) are chosen based on a three-dimensional configuration which is 
formed upon the folding of the target motif. There are a variety of target motifs 
known in the art. Protein target motifs include, but are not limited to, enzymic 
active sites and signal sequences. Nucleic acid target motifs include, but are not 
limited to, promoter sequences, hairpin structures and inducible expression 
elements (protein binding sequences). 

[0088] A variety of structural formats for the input and output means can be used 
to input and output the information in the computer-based systems of the present 
invention. A preferred format for an output means ranks fragments of the 
Staphylococcus aureus genomic sequences possessing varying degrees of 
homology to the target sequence or target motif. Such presentation provides a 
skilled artisan with a ranking of sequences which contain various amounts of the 
target sequence or target motif and identifies the degree of homology contained in 
the identified fragment. 

[0089] A variety of comparing means can be used to compare a target sequence 
or target motif with the data storage means to identify sequence fragments of the 
Staphylococcus aureus genome. In the present examples, implementing software 
which implement the BLAST and BLAZE algorithms, described in Altschul et 
aL, J. Mol Biol. 215 : 403-410 (1990), was used to identify open reading frames 
within the Staphylococcus aureus genome. A skilled artisan can readily 
recognize that any one of the publicly available homology search programs can be 
used as the search means for the computer-based systems of the present invention. 
Of course, suitable proprietary systems that may be known to those of skill also 
may be employed in this regard. , 



16 



[0090] Figure 1 provides a block diagram of a computer system illustrative of 
embodiments of this aspect of present invention. The computer system 1 02 
includes a processor 106 connected to a bus 104. Also connected to the bus 104 
are a main memory 108 (preferably implemented as random access memory, 
RAM) and a variety of secondary storage devices 110, such as a hard drive 112 
and a removable medium storage device 114. The removable medium storage 
device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a 
magnetic tape drive, etc. A removable storage medium 116 (such as a floppy 
disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data 
recorded therein may be inserted into the removable medium storage device 114. 
The computer system 1 02 includes appropriate software for reading the control 
logic and/or the data from the removable medium storage device 114, once it is 
inserted into the removable medium storage device 114. 

[0091] A nucleotide sequence of the present invention may be stored in a well 
known manner in the main memory 108, any of the secondary storage devices 
110, and/or a removable storage medium 116. During execution, software for 
accessing and processing the genomic sequence (such as search tools, comparing 
tools, etc.) reside in main memory 108, in accordance with the requirements and 
operating parameters of the operating system, the hardware system and the 
software program or programs. 



[0092] BIOCHEMICAL EMBODIMENTS 



[0093] Other embodiments of the present invention are directed to isolated 
fragments of the Staphylococcus aureus genome. The fragments of the 
Staphylococcus aureus genome of the present invention include, but are not 
limited to fragments which encode peptides^hereinafter open reading frames 
(ORFs), fragments which modulate the expression of an operably linked ORF, 
hereinafter expression modulating fragments (EMFs) and fragments which can be 
used to diagnose the presence of Staphylococcus aureus in a sample, hereinafter 
diagnostic fragments (DFs). 

[0094] As used herein, an "isolated nucleic acid molecule" or an "isolated 
fragment of the Staphylococcus aureus genome" refers to a nucleic acid molecule 
possessing a specific nucleotide sequence which has been subjected to 
purification means to reduce, from the composition, the number of compounds 
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which are normally associated with the composition. Particularly, the term refers 
to the nucleic acid molecules having the sequences set out in SEQ ID NOS:l- 
5,191, to representative fragments thereof as described above, to polynucleotides 
at least 95%, preferably at least 99% and especially preferably at least 99.9% 
identical in sequence thereto, also as set out above. 

[0095] A variety of purification means can be used to generated the isolated 
fragments of the present invention. These include, but are not limited to methods 
which separate constituents of a solution based on charge, solubility, or size. 
[0096] In one embodiment, Staphylococcus aureus DNA can be mechanically 
sheared to produce fragments of 15-20 kb in length. These fragments can then be 
used to generate an Staphylococcus aureus library by inserting them into lambda 
clones as described in the Examples below. Primers flanking, for example, an 
ORF, such as those enumerated in Tables 1-3 can then be generated using 
nucleotide sequence information provided in SEQ ID NOS: 1-5,191 . Well known 
and routine techniques of PCR cloning then can be used to isolate the ORF from 
the lambda DNA library of Staphylococcus aureus genomic DNA. Thus, given 
the availability of SEQ ID NOS: 1-5,1 91, the information in Tables 1, 2 and 3, and 
the information that may be obtained readily by analysis of the sequences of SEQ 
ID NOS: 1-5,1 91 using methods set out above, those of skill will be enabled by 
the present disclosure to isolate any ORF-containing or other nucleic acid 
fragment of the present invention. 

[0097] The isolated nucleic acid molecules of the present invention include, but 
are not limited to single stranded and double stranded DNA, and single stranded 
RNA. 

[0098] As used herein, an "open reading frame," ORF, means a series of triplets 
coding for amino acids without any termination codons and is a sequence 
translatable into protein. 

[0099] Tables 1 , 2 and 3 list ORFs in the Staphylococcus aureus genomic contigs 
of the present invention that were identified as putative coding regions by the 
GeneMark software using organism-specific second-order Markov probability 
transition matrices. It will be appreciated that other criteria can be used, in 
accordance with well known analytical methods, such as those discussed herein, 
to generate more inclusive, more restrictive or more selective lists. 
[0100] Table 1 sets out ORFs in the Staphylococcus aureus contigs of the present 
invention that are at least 80 amino acids long and over a continuous region of at 
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least 50 bases which are 95% or more identical (by BLAST analysis) to an S. 
aureus nucleotide sequence available through Genbank in November 1996. 
[0101] Table 2 sets out ORFs in the Staphylococcus aureus contigs of the present 
invention that are not in Table 1 and match, with a BLASTP probability score of 
0.01 or less, a polypeptide sequence available through Genbank by September 
1996. 

[0102] Table 3 sets out ORFs in the Staphylococcus aureus contigs of the present 
invention that do not match significantly, by BLASTP analysis, a polypeptide 
sequence available through Genbank by September 1996. 
[0103] In each table, the first and second columns identify the ORF by, 
respectively, contig number and ORF number within the contig; the third column 
indicates the reading frame, taking the first 5' nucleotide of the contig as the start 
of the +1 frame; the fourth column indicates the first nucleotide of the ORF, 
counting from the 5' end of the contig strand; and the fifth column indicates the 
length of each ORF in nucleotides. 

[0104] In Tables 1 and 2, column six, lists the Reference" for the closest 
matching sequence available through Genbank. These reference numbers are the 
databases entry numbers commonly used by those of skill in the art, who will be 
familiar with their denominators. Descriptions of the numenclature are available 
from the National Center for Biotechnology Information. Column seven in 
Tables 1 and 2 provides the gene name" of the matching sequence; column eight 
provides the BLAST identity" score from the comparison of the ORF and the 
homologous gene; and column nine indicates the length in nucleotides of the 

-i 

highest scoring segment pair" identified by the BLAST identity analysis. 
[0105] In Table 3, the last column, column six, indicates the length of each ORF 
in amino acid residues. 

[0106] The concepts of percent identity and percent similarity of two polypeptide 
sequences is well understood in the art. For example, two polypeptides 1 0 amino 
acids in length which differ at three amino acid positions (e.g., at positions 1, 3 
and 5) are said to have a percent identity of 70%. However, the same two 
polypeptides would be deemed to have a percent similarity of 80% if, for example 
at position 5, the amino acids moieties, although not identical, were "similar 11 
(i.e., possessed similar biochemical characteristics). Many programs for analysis 
of nucleotide or amino acid sequence similarity, such as fasta and BLAST 
specifically list per cent identity of a matching region as an output parameter. 
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Thus, for instance, Tables 1 and 2 herein enumerate the per cent identity" of the 
highest scoring segment pair" in each ORF and its listed relative. Further details 
concerning the algorithms and criteria used for homology searches are provided 
below and are described in the pertinent literature highlighted by the citations 
provided below. 

[0107] It will be appreciated that other criteria can be used to generate more 
inclusive and more exclusive listings of the types set out in the tables. As those 
of skill will appreciate, narrow and broad searches both are useful. Thus, a 
skilled artisan can readily identify ORFs in contigs of the Staphylococcus aureus 
genome other than those listed in Tables 1-3, such as ORFs which are 
overlapping or encoded by the opposite strand of an identified ORF in addition to 
those ascertainable using the computer-based systems of the present invention. 
[0108] As used herein, an "expression modulating fragment," EMF, means a 
series of nucleotide molecules which modulates the expression of an operably 
linked ORF or EMF. 

[0109] As used herein, a sequence is said to "modulate the expression of an 
operably linked sequence" when the expression of the sequence is altered by the 
presence of the EMF. EMFs include, but are not limited to, promoters, and 
promoter modulating sequences (inducible elements). One class of EMFs are 
fragments which induce the expression or an operably linked ORF in response to 
a specific regulatory factor or physiological event. 

[0110] EMF sequences can be identified within the contigs of the Staphylococcus 
aureus genome by their proximity to the ORFs provided in Tables 1-3. An 
intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 
nucleotides in length, taken from any one of the ORFs of Tables 1-3 will 
modulate the expression of an operably linked ORF in a fashion similar to that 
found with the naturally linked ORF sequence. As used herein, an "intergenic 
segment" refers to fragments of the Staphylococcus aureus genome which are 
between two ORF(s) herein described. EMFs also can be identified using known 
EMFs as a target sequence or target motif in the computer-based systems of the 
present invention. Further, the two methods can be combined and used together. 
[0111] The presence and activity of an EMF can be confirmed using an EMF trap 
vector. An EMF trap vector contains a cloning site linked to a marker sequence. 
A marker sequence encodes an identifiable phenotype, such as antibiotic 
resistance or a complementing nutrition auxotrophic factor, which can be 
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identified or assayed when the EMF trap vector is placed within an appropriate 
host under appropriate conditions. As described above, a EMF will modulate the 
expression of an operably linked marker sequence. A more detailed discussion of 
various marker sequences is provided below. 

[0112] A sequence which is suspected as being an EMF is cloned in all three 
reading frames in one or more restriction sites upstream from the marker 
sequence in the EMF trap vector. The vector is then transformed into an 
appropriate host using known procedures and the phenotype of the transformed 
host in examined under appropriate conditions. As described above, an EMF will 
modulate the expression of an operably linked marker sequence. 
[0113] As used herein, a "diagnostic fragment," DF, means a series of nucleotide 
molecules which selectively hybridize to Staphylococcus aureus sequences. DFs 
can be readily identified by identifying unique sequences within contigs of the 
Staphylococcus aureus genome, such as by using well-known computer analysis 
software, and by generating and testing probes or amplification primers consisting 
of the DF sequence in an appropriate diagnostic format which determines 
amplification or hybridization selectivity. 

[0114] The sequences falling within the scope of the present invention are not 
limited to the specific sequences herein described, but also include allelic and 
species variations thereof. Allelic and species variations can be routinely 
determined by comparing the sequences provided in SEQ ID NOS:l-5,191, a 
representative fragment thereof, or a nucleotide sequence at least 99% and 
preferably 99.9% identical to SEQ ID NOS: 1-5,191, with a sequence from 
another isolate of the same species. Furthermore, to accommodate codon 
variability, the invention includes nucleic acid molecules coding for the same 
amino acid sequences as do the specific ORFs disclosed herein. In other words, 
in the coding region of an ORF, substitution of one codon for another which 
encodes the same amino acid is expressly contemplated. 

[0115] Any specific sequence disclosed herein can be readily screened for errors 
by resequencing a particular fragment, such as an ORF, in both directions (/. e. , 
sequence both strands). Alternatively, error screening can be performed by 
sequencing corresponding polynucleotides of Staphylococcus aureus origin 
isolated by using part or all of the fragments in question as a probe or primer. 
[0116] Each of the ORFs of the Staphylococcus aureus genome disclosed in 
Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as 
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polynucleotide reagents in numerous ways. For example, the sequences can be 
used as diagnostic probes or diagnostic amplification primers to detect the 
presence of a specific microbe in a sample, particular Staphylococcus aureus. 
Especially preferred in this regard are ORF such as those of Table 3, which do not 
match previously characterized sequences from other organisms and thus are 
most likely to be highly selective for Staphylococcus aureus. Also particularly 
preferred are ORFs that can be used to distinguish between strains of 
Staphylococcus aureus, particularly those that distinguish medically important 
strain, such as drug-resistant strains. 

[0117] In addition, the fragments of the present invention, as broadly described, 
can be used to control gene expression through triple helix formation or antisense 
DNA or RNA, both of which methods are based on the binding of a 
polynucleotide sequence to DNA or RNA. Triple helix- formation optimally 
results in a shut-off of RNA transcription from DNA, while antisense RNA 
hybridization blocks translation of an mRNA molecule into polypeptide. 
Information from the sequences of the present invention can be used to design 
antisense and triple helix-forming oligonucleotides. Polynucleotides suitable for 
use in these methods are usually 20 to 40 bases in length and are designed to be 
complementary to a region of the gene involved in transcription, for triple-helix 
formation, or to the mRNA itself, for antisense inhibition. Both techniques have 
been demonstrated to be effective in model systems, and the requisite techniques 
are well known and involve routine procedures. Triple helix techniques are 
discussed in, for example, Lee et al.,Nucl. Acids Res. 6: 3073 (1979); Cooney et 
al, Science 241 : 456 (1988); and Dervan et al, Science 251 : 1360 (1991). 
Antisense techniques in general are discussed in, for instance, Okano, J. 
Neurochem. 56: 560 (1991) and OLIGODEOXYNUCLEOTIDES AS 
ANTISENSE INHIBITORS OF GENE EXPRESSION, CRC Press, Boca Raton, 
FL(1988)). 

[0118] The present invention further provides recombinant constructs comprising 
one or more fragments of the Staphylococcus aureus genomic fragments and 
contigs of the present invention. Certain preferred recombinant constructs of the 
present invention comprise a vector, such as a plasmid or viral vector, into which 
a fragment of the Staphylococcus aureus genome has been inserted, in a forward 
or reverse orientation. In the case of a vector comprising one of the ORFs of the 
present invention, the vector may further comprise regulatory sequences, 
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including for example, a promoter, operably linked to the ORF. For vectors 
comprising the EMFs of the present invention, the vector may further comprise a 
marker sequence or heterologous ORF operably linked to the EMF. 
[0119] Large numbers of suitable vectors and promoters are known to those of 
skill in the art and are commercially available for generating the recombinant 
constructs of the present invention. The following vectors are provided by way of 
example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK 
and KS (+ and -), pNH8a, pNH16a, pNH18a, pNH46a (available from 
Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from 
Pharmacia). Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, 
pXTl, pSG (available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available 
from Pharmacia). 

[0120] Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. 
Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metallothionein- I. Selection of the 
appropriate vector and promoter is well within the level of ordinary skill in the 
art. 

[0121] The present invention further provides host cells containing any one of the 
isolated fragments of the Staphylococcus aureus genomic fragments and contigs 
of the present invention, wherein the fragment has been introduced into the host 
cell using known methods. The host cell can be a higher eukaryotic host cell, 
such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or a 
procaryotic cell, such as a bacterial cell. 

[0122] A polynucleotide of the present invention, such as a recombinant 
construct comprising an ORF of the present invention, may be introduced into the 
host by a variety of well established techniques that are standard in the art, such 
as calcium phosphate transfection, DEAE, dextran mediated transfection and 
electroporation, which are described in, for instance, Davis, L. et al , BASIC 
METHODS IN MOLECULAR BIOLOGY (1986). 

[0123] A host cell containing one of the fragments of the Staphylococcus aureus 
genomic fragments and contigs of the present invention, can be used in 
conventional manners to produce the gene product encoded by the isolated 
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fragment (in the case of an ORF) or can be used to produce a heterologous protein 
under the control of the EMF. 

[0124] The present invention further provides isolated polypeptides encoded by 
the nucleic acid fragments of the present invention or by degenerate variants of 
the nucleic acid fragments of the present invention. By "degenerate variant" is 
intended nucleotide fragments which differ from a nucleic acid fragment of the 
present invention (e.g., an ORF) by nucleotide sequence but, due to the 
degeneracy of the Genetic Code, encode an identical polypeptide sequence. 
[0125] Preferred nucleic acid fragments of the present invention are the ORFs 
depicted in Tables 2 and 3 which encode proteins. 

[0126] A variety of methodologies known in the art can be utilized to obtain any 
one of the isolated polypeptides or proteins of the present invention. At the 
simplest level, the amino acid sequence can be synthesized using commercially 
available peptide synthesizers. This is particularly useful in producing small 
peptides and fragments of larger polypeptides. Such short fragments as may be 
obtained most readily by synthesis are useful, for example, in generating 
antibodies against the native polypeptide, as discussed further below. 
[0127] In an alternative method, the polypeptide or protein is purified from 
bacterial cells which naturally produce the polypeptide or protein. One skilled in 
the art can readily employ well-known methods for isolating polpeptides and 
proteins to isolate and purify polypeptides or proteins of the present invention 
produced naturally by a bacterial strain, or by other methods. Methods for 
isolation and purification that can be employed in this regard include, but are not 
limited to, immunochromatography, HPLC, size-exclusion chromatography, ion- 
exchange chromatography, and immuno-affinity chromatography. 
[0128] The polypeptides and proteins of the present invention also can be 
purified from cells which have been altered to express the desired polypeptide or 
protein. As used herein, a cell is said to be altered to express a desired 
polypeptide or protein when the cell, through genetic manipulation, is made to 
produce a polypeptide or protein which it normally does not produce or which the 
cell normally produces at a lower level. Those skilled in the art can readily adapt 
procedures for introducing and expressing either recombinant or synthetic 
sequences into eukaryotic or prokaryotic cells in order to generate a cell which 
produces one of the polypeptides or proteins of the present invention. 
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[0129] Any host/vector system can be used to express one or more of the ORPs 
of the present invention. These include, but are not limited to, eukaryotic hosts 
such as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic 
host such as E. coli and B. subtilis. The most preferred cells are those which do 
not normally express the particular polypeptide or protein or which expresses the 
polypeptide or protein at low natural level. 

[0130] "Recombinant," as used herein, means that a polypeptide or protein is 
derived from recombinant (e.g., microbial or mammalian) expression systems. 
"Microbial" refers to recombinant polypeptides or proteins made in bacterial or 
fungal (e.g., yeast) expression systems. As a product, "recombinant 
microbial"defines a polypeptide or protein essentially free of native endogenous 
substances and unaccompanied by associated native glycosylation. Polypeptides 
or proteins expressed in most bacterial cultures, e.g., E. coli, will be free of 
glycosylation modifications; polypeptides or proteins expressed in yeast will have 
a glycosylation pattern different from that expressed in mammalian cells. 
[0131] "Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. 
Generally, DNA segments encoding the polypeptides and proteins provided by 
this invention are assembled from fragments of the Staphylococcus aureus 
genome and short oligonucleotide linkers, or from a series of oligonucleotides, to 
provide a synthetic gene which is capable of being expressed in a recombinant 
transcriptional unit comprising regulatory elements derived from a microbial or 
viral operon. 

[0132] ORecombinant expression vehicle or vector" refers to a plasmid or phage 
or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. 
The expression vehicle can comprise a transcriptional unit comprising an 
assembly of ( 1 ) a genetic regulatory elements necessary for gene expression in the 
host, including elements required to initiate and maintain transcription at a level 
sufficient for suitable expression of the desired polypeptide, including, for 
example, promoters and, where necessary, an enhancers and a polyadenylation 
signal; (2) a structural or coding sequence which is transcribed into mRNA and 
translated into protein, and (3) appropriate signals to initiate translation at the 
beginning of the desired coding region and terminate translation at its end. 
Structural units intended for use in yeast or eukaryotic expression systems 
preferably include a leader sequence enabling extracellular secretion of translated 
protein by a host cell. Alternatively, where recombinant protein is expressed 
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without a leader or transport sequence, it may include an N-terminal methionine 
residue. This residue may or may not be subsequently cleaved from the expressed 
recombinant protein to provide a final product. 

[0133] "Recombinant expression system" means host cells which have stably 
integrated a recombinant transcriptional unit into chromosomal DNA or carry the 
recombinant transcriptional unit extra chromosomally. The cells can be 
prokaryotic or eukaryotic. Recombinant expression systems as defined herein will 
express heterologous polypeptides or proteins upon induction of the regulatory 
elements linked to the DNA segment or synthetic gene to be expressed. 
[0134] Mature proteins can be expressed in mammalian cells, yeast, bacteria, or 
other cells under the control of appropriate promoters. Cell-free translation 
systems can also be employed to produce such proteins using RNAs derived from 
the DNA constructs of the present invention. Appropriate cloning and expression 
vectors for use with prokaryotic and eukaryotic hosts are described in Sambrook 
et al, MOLECULAR CLONING: A LABORATORY MANUAL, 2 nd Edition, 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the 
disclosure of which is hereby incorporated by reference in its entirety. 
[0135] Generally, recombinant expression vectors will include origins of 
replication and selectable markers permitting transformation of the host cell, e.g., 
the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a 
promoter derived from a highly expressed gene to direct transcription of a 
downstream structural sequence. Such promoters can be derived from operons 
encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alpha- 
factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation 
and termination sequences, and preferably, a leader sequence capable of directing 
secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an 
N-terminal identification peptide imparting desired characteristics, e.g., 
stabilization or simplified purification of expressed recombinant product. 
[0136] Useful expression vectors for bacterial use are constructed by inserting a 
structural DNA sequence encoding a desired protein together with suitable 
translation initiation and termination signals in operable reading phase with a 
functional promoter. The vector will comprise one or more phenotypic selectable 
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markers and an origin of replication to ensure maintenance of the vector and, 
when desirable, provide amplification within the host. 
[0137] Suitable prokaryotic hosts for transformation include strains of 
Staphylococcus aureus, E. coli, B. subtilis, Salmonella typhimurium and various 
species within the genera Pseudomonas, Streptomyces, and Staphylococcus. 
Others may, also be employed as a matter of choice. 

[0138] As a representative but non-limiting example, useful expression vectors 
for bacterial use can comprise a selectable marker and bacterial origin of 
replication derived from commercially available plasmids comprising genetic 
elements of the well known cloning vector pBR322 (ATCC 37017). Such 
commercial vectors include, for example, pKK223-3 (available form Pharmacia 
Fine Chemicals, Uppsala, Sweden) and GEM 1 (available from Promega Biotec, 
Madison, WI, USA). These pBR322 "backbone" sections are combined with an 
appropriate promoter and the structural sequence to be expressed. 
[0139] Following transformation of a suitable host strain and growth of the host 
strain to an appropriate cell density, the selected promoter, where it is inducible, 
is derepressed or induced by appropriate means (e.g., temperature shift or 
chemical induction) and cells are cultured for an additional period to provide for 
expression of the induced gene product. Thereafter cells are typically harvested, 
generally by centrifugation, disrupted to release expressed protein, generally by 
physical or chemical means, and the resulting crude extract is retained for further 
purification. 

[0140] Various mammalian cell culture systems can also be employed to express 
recombinant protein. Examples of mammalian expression systems include the 
COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell ' 23: 175 
(1981), and other cell lines capable of expressing a compatible vector, for 
example, the C127, 3T3, CHO, HeLa and BHK cell lines. 
[0141] Mammalian expression vectors will comprise an origin of replication, a 
suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived 
from the SV40 viral genome, for example, SV40 origin, early promoter, 
enhancer, splice, and polyadenylation sites may be used to provide the required 
nontranscribed genetic elements. 
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[0142] Recombinant polypeptides and proteins produced in bacterial culture is 
usually isolated by initial extraction from cell pellets, followed by one or more 
salting-out, aqueous ion exchange or size exclusion chromatography steps. 
Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical 
disruption, or use of cell lysing agents. Protein refolding steps can be used, as 
necessary, in completing configuration of the mature protein. Finally, high 
performance liquid chromatography (HPLC) can be employed for final 
purification steps. 

[0143] An additional aspect of the invention includes Staphylococcus aureus 
polypeptides which are useful as immunodiagnostic antigens and/or 
immunoprotective vaccines, collectively "immunologically useful polypeptides". 
Such immunologically useful polypeptides may be selected from the ORFs 
disclosed herein based on techniques well known in the art and described 
elsewhere herein. The inventors have used the following criteria to select several 
immunologically useful polypeptides: 

[0144] As is known in the art, an amino terminal type I signal sequence directs a 
nascent protein across the plasma and outer membranes to the exterior of the 
bacterial cell. Such outermembrane polypeptides are expected to be 
immunologically useful. According to Izard, J. W. et al., Mol. Microbiol. 13, 
765-773; (1994), polypeptides containing type I signal sequences contain the 
following physical attributes: The length of the type I signal sequence is 
approximately 15 to 25 primarily hydrophobic amino acid residues with a net 
positive charge in the extreme amino terminus; the central region of the signal 
sequence must adopt an alpha-helical conformation in a hydrophobic 
environment; and the region surrounding the actual site of cleavage is ideally six 
residues long, with small side-chain amino acids in the -1 and -3 positions. 
[0145] Also known in the art is the type IV signal sequence which is an example 
of the several types of functional signal sequences which exist in addition to the 
type I signal sequence detailed above. Although functionally related, the type IV 
signal sequence possesses a unique set of biochemical and physical attributes 
(Strom, M. S. and Lory, S., J. Bacteriol. 174, 7345-7351; 1992)). These are 
typically six to eight amino acids with a net basic charge followed by an 
additional sixteen to thirty primarily hydrophobic residues. The cleavage site of a 
type IV signal sequence is typically after the initial six to eight amino acids at the 
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extreme amino terminus. In addition, all type IV signal sequences contain a 
phenylalanine residue at the +1 site relative to the cleavage site. 
[0146] Studies of the cleavage sites of twenty-six bacterial lipoprotein precursors 
has allowed the definition of a consensus amino acid sequence for lipoprotein 
cleavage. Nearly three-fourths of the bacterial lipoprotein precursors examined 
contained the sequence L-(A,S)-(G,A)-C at positions -3 to +1, relative to the 
point of cleavage (Hayashi, S. and Wu, H. C. Lipoproteins in bacteria. J 
Bioenerg. Biomembr. 22, 451-471; 1990). 

[0147] It well known that most anchored proteins found on the surface of gram- 
positive bacteria possess a highly conserved carboxy terminal sequence. More 
than fifty such proteins from organisms such as & pyogenes, S. mutans, E. 
faecalis, S, pneumoniae, and others, have been identified based on their 
extracellular location and carboxy terminal amino acid sequence (Fischetti, V. A. 
Gram-positive commensal bacteria deliver antigens to elicit mucosal and 
systemic immunity. ASM News 62, 405-410; 1996). The conserved region is 
comprised of six charged amino acids at the extreme carboxy terminus coupled to 
1 5-20 hydrophobic amino acids presumed to 

function as a transmembrane domain. Immediately adjacent to the 
transmembrane domain is a six amino acid sequence conserved in nearly all 
proteins examined. The amino acid sequence of this region is L-P-X-T-G-X, 
where X is any amino acid. 

[0148] Amino acid sequence similarities to proteins of known function by 
BLAST enables the assignment of putative functions to novel amino acid 
sequences and allows for the selection of proteins thought to function outside the 
cell wall. Such proteins are well known in the art and include "lipoprotein", 
"periplasmic", or "antigen". 

[0149] An algorithm for selecting antigenic and immunogenic Staphylococcus 
aureus polypeptides including the foregoing criteria was developed by the present 
inventors. Use of the algorithm by the inventors to select immunologically useful 
Staphylococcus aureus polypeptides resulted in the selection of several ORFs 
which are predicted to be outermembrane-associated proteins. These proteins are 
identified in Table 4, below, and shown in the Sequence Listing as SEQ ID 
NOS:5,192 to 5,255. Thus the amino acid sequence of each of several 
antigznicStaphylococcus aureus polypeptides listed in Table 4 can be determined, 
for example, by locating the amino acid sequence of the ORF in the Sequence 
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Listing. Likewise the polynucleotide sequence encoding each ORF can be-found 
by locating the corresponding polynucleotide SEQ ED in Tables 1, 2, or 3, and 
finding the corresponding nucleotide sequence in the sequence listing. 
[0150] As will be appreciated by those of ordinary skill in the art, although a 
polypeptide representing an entire ORF may be the closest approximation to a 
protein found in vivo, it is not always technically practical to express a complete 
ORF in vitro. It may be very challenging to express and purify a highly 
hydrophobic protein by common laboratory methods. As a result, the 
immunologically useful polypeptides described herein as SEQ ID NOS:5,192- 
5,255 may have been modified slightly to simplify the production of recombinant 
protein, and are the preferred embodiments. In general, nucleotide sequences 
which encode highly hydrophobic domains, such as those found at the amino 
terminal signal sequence, are excluded for enhanced in vitro expression of the 
polypeptides. Furthermore, any highly hydrophobic amino acid sequences 
occurring at the carboxy terminus are also excluded. Such truncated polypeptides 
include for example the mature forms of the polypeptides expected to exist in 
nature. 

[0151] Those of ordinary skill in the art can identify soluble portions the 
polypeptide identified in Table 4, and in the case of truncated polypeptides 
sequences shown as SEQ ID NOS:5, 192-5,255, may obtain the complete 
predicted amino acid sequence of each polypeptide by translating the 
corresponding polynucleotides sequences of the corresponding ORF listed in 
Tables 1 ,2 and 3 and found in the sequence listing. 

[0152] Accordingly, polypeptides comprising the complete amino acid sequence 
of an immunologically useful polypeptide selected from the group of polypeptides 
encoded by the ORFs identified in Table 4, or an amino acid sequence at least 
95% identical thereto, preferably at least 97% identical thereto, and most 
preferably at least 99% identical thereto form an embodiement of the invention; 
in addition, polypeptides comprising an amino acid sequence selected from the 
group of amino acid sequences shown in the sequence listing as SEQ ID 
NOS:5, 191-5,255, or an amino acid sequence at least 95% identical thereto, 
preferably at least 97% identical thereto and most preferably 99% identical 
thereto, form an embodiment of the invention. Polynucleotides encoding the 
foregoing polypeptides also form part of the invention. 
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[0153] In another aspect, the invention provides a peptide or polypeptide 
comprising an epitope-bearing portion of a polypeptide of the invention, 
particularly those epitope-bearing portions (antigenic regions) identified in Table 
4. The epitope-bearing portion is an immunogenic or antigenic epitope of a 
polypeptide of the invention. An "immunogenic epitope" is defined as a part of a 
protein that elicits an antibody response when the whole protein is the 
immunogen. On the other hand, a region of a protein molecule to which an 
antibody can bind is defined as an "antigenic epitope." The number of 
immunogenic epitopes of a protein generally is less than the number of antigenic 
epitopes. See, for instance, Geysen et al., Proc. Natl. Acad. Sci. USA 81 :3998- 
4002(1983). 

[0154] As to the selection of peptides or polypeptides bearing an antigenic 
epitope (i.e., that contain a region of a protein molecule to which an antibody can 
bind), it is well known in that art that relatively short synthetic peptides that 
mimic part of a protein sequence are routinely capable of eliciting an antiserum 
that reacts with the partially mimicked protein. See, for instance, Sutcliffe, J. G., 
Shinnick, T. M., Green, N. and Learner, R. A. (1983) "Antibodies that react with 
predetermined sites on proteins", Science, 219:660-666. Peptides capable of 
eliciting protein-reactive sera are frequently represented in the primary sequence 
of a protein, can be characterized by a set of simple chemical rules, and are 
confined neither to immunodominant regions of intact proteins (i.e., 
immunogenic epitopes) nor to the amino or carboxyl terminals. Antigenic 
epitope-bearing peptides and polypeptides of the invention are therefore useful to 
raise antibodies, including monoclonal antibodies, that bind specifically to a 
polypeptide of the invention. See, for instance, Wilson et al., Cell 37:767-778 
(1984) at 777. 

[0155] Antigenic epitope-bearing peptides and polypeptides of the invention 
preferably contain a sequence of at least seven, more preferably at least nine and 
most preferably between about 15 to about 30 amino acids contained within the 
amino acid sequence of a polypeptide of the invention. Non-limiting examples of 
antigenic polypeptides or peptides that can be used to generate S. aureus specific 
antibodies include: a polypeptide comprising peptides shown in Table 4 below. 
These polypeptide fragments have been determined to bear antigenic epitopes of 
indicated S. aureus proteins by the analysis of the Jameson- Wolf antigenic index, 
a representative sample of which is shown in Figure 3. 
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[0156] The epitope-bearing peptides and polypeptides of the invention may be 
produced by any conventional means. See, e.g., Houghten, R. A. (1985) General 
method for the rapid solid-phase synthesis of large numbers of peptides: 
specificity of antigen-antibody interaction at the level of individual amino acids. 
Proc. Natl. Acad. Sci. USA 82:5131-5135; this "Simultaneous Multiple Peptide 
Synthesis (SMPS)" process is further described in U.S. Patent No. 4,631,21 1 to 
Houghten et al. (1986). Epitope-bearing peptides and polypeptides of the 
invention are used to induce antibodies according to methods well known in the 
art. See, for instance, Sutcliffe et al., supra; Wilson et al., supra; Chow, M. et al., 
Proc. Natl. Acad. Sci. USA 82:910-914; and Bittle, F. J. et al., J. Gen. Virol. 
66:2347-2354 (1985). Immunogenic epitope-bearing peptides of the invention, 
i.e., those parts of a protein that elicit an antibody response when the whole 
protein is the immunogen, are identified according to methods known in the art. 
See, for instance, Geysen et al., supra. Further still, U.S. Patent No. 5,194,392 to 
Geysen (1990) describes a general method of detecting or determining the 
sequence of monomers (amino acids or other compounds) which is a topological 
equivalent of the epitope (i.e., a "mimotope") which is complementary to a 
particular paratope (antigen binding site) of an antibody of interest. More 
generally, U.S. Patent No. 4,433,092 to Geysen (1989) describes a method of 
detecting or determining a sequence of monomers which is a topographical 
equivalent of a ligand which is complementary to the ligand binding site of a 
particular receptor of interest. Similarly, U.S. Patent No. 5,480,971 to Houghten, 
R. A. et al. (1996) on Peralkylated Oligopeptide Mixtures discloses linear 
Cl-C7-alkyl peralkylated oligopeptides and sets and libraries of such peptides, as 
well as methods for using such oligopeptide sets and libraries for determining the 
sequence of a peralkylated oligopeptide that preferentially binds to an acceptor 
molecule of interest. Thus, non-peptide analogs of the epitope-bearing peptides 
of the invention also can be made routinely by these methods. 
[0157] Table 4 lists immunologically useful polypeptides identified by an 
algorithm which locates novel Staphylococcus aureus outermembrane proteins, as 
is described above. Also listed are epitopes or "antigenic regions" of each of the 
identified polypeptides. The antigenic regions, or epitopes, are delineated by two 
numbers x-y, where x is the number of the first amino acid in the open reading 
frame included within the epitope and y is the number of the last amino acid in 
the open reading frame included within the epitope. For example, the first 
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epitope in ORF 168-6 is comprised of amino acids 36 to 45 of SEQ ID NO:5,192, 
as is described in Table 4. The inventors have identified several epitopes for each 
of the antigenic polypeptides identified in Table 4. Accordingly, forming part of 
the present invention are polypeptides comprising an amino acid sequence of one 
or more antigenic regions identified in Table 4. The invention further provides 
polynucleotides encoding such polypeptides. 

[0158] The present invention further includes isolated polypeptides, proteins and 
nucleic acid molecules which are substantially equivalent to those herein 
described. As used herein, substantially equivalent can refer both to nucleic acid 
and amino acid sequences, for example a mutant sequence, that varies from a 
reference sequence by one or more substitutions, deletions, or additions, the net 
effect of which does not result in an adverse functional dissimilarity between 
reference and subject sequences. For purposes of the present invention, 
sequences having equivalent biological activity, and equivalent expression 
characteristics are considered substantially equivalent. For purposes of 
determining equivalence, truncation of the mature sequence should be 
disregarded. 

[0159] The invention further provides methods of obtaining homologs from other 
strains of Staphylococcus aureus, of the fragments of the Staphylococcus aureus 
genome of the present invention and homologs of the proteins encoded by the 
ORFs of the present invention. As used herein, a sequence or protein of 
Staphylococcus aureus is defined as a homolog of a fragment of the 
Staphylococcus aureus fragments or contigs or a protein encoded by one of the 
ORFs of the present invention, if it shares significant homology to one of the 
fragments of the Staphylococcus aureus genome of the present invention or a 
protein encoded by one of the ORFs of the present invention. Specifically, by 
using the sequence disclosed herein as a probe or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain 
homologs. 

[0160] As used herein, two nucleic acid molecules or proteins are said to "share 
significant homology" if the two contain regions which prossess greater than 85% 
sequence (amino acid or nucleic acid) homology. Preferred homologs in this 
regard are those with more than 90% homology. Especially preferred are those 
with 93% or more homology. Among especially preferred homologs those with 
95% or more homology are particularly preferred. Very particularly preferred 
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among these are those with 97% and even more particularly preferred among 
those are homologs with 99% or more homology. The most preferred homologs 
among these are those with 99.9% homology or more. It will be understood that, 
among measures of homology, identity is particularly preferred in this regard. 
[0161] Region specific primers or probes derived from the nucleotide sequence 
provided in SEQ ID NOS:l-5,191 or from a nucleotide sequence at least 95%, 
particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ 
ID NOS:l-5,191 can be used to prime DNA synthesis and PCR amplification, as 
well as to identify colonies containing cloned DNA encoding a homolog. 
Methods suitable to this aspect of the present invention are well known and have 
been described in great detail in many publications such as, for example, Innis et 
al. , PCR PROTOCOLS, Academic Press, San Diego, CA (1990)). 
[0162] When using primers derived from SEQ ID NOS: 1-5,191 or from a 
nucleotide sequence having an aforementioned identity to a sequence of SEQ ID 
NOS: 1-5,191, one skilled in the art will recognize that by employing high 
stringency conditions {e.g., annealing at 50-60°C in 6X SSPC and 50% 
formamide, and washing at 50- 65°C in 0.5X SSPC) only sequences which are 
greater than 75% homologous to the primer will be amplified. By employing 
lower stringency conditions {e.g., hybridizing at 35-37°C in 5X SSPC and 40- 
45% formamide, and washing at 42°C in 0.5X SSPC), sequences which are 
greater than 40-50% homologous to the primer will also be amplified. 
[0163] When using DNA probes derived from SEQ ID NOS:l-5,19J, or from a 
nucleotide sequence having an aforementioned identity to a sequence of SEQ ID 
NOS: 1-5, 191, for colony/plaque hybridization, one skilled in the art will 
recognize that by employing high stringency conditions {e.g., hybridizing at 50- 
65°C in 5X SSPC and 50% formamide, and washing at 50- 65°C in 0.5X SSPC), 
sequences having regions which are greater than 90% homologous to the probe 
can be obtained, and that by employing lower stringency conditions {e.g., 
hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 
42°C in 0.5X SSPC), sequences having regions which are greater than 35-45% 
homologous to the probe will be obtained. 

[0164] Any organism can be used as the source for homologs of the present 
invention so long as the organism naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs 
are bacterias which are closely related to Staphylococcus aureus. 
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[0165] ILLUSTRATIVE USES OF COMPOSITIONS OF THE 
INVENTION 



[0166] Each ORF provided in Tables 1 and 2 is identified with a function by 
homology to a known gene or polypeptide. As a result, one skilled in the art can 
use the polypeptides of the present invention for commercial, therapeutic and 
industrial purposes consistent with the type of putative identification of the 
polypeptide. Such identifications permit one skilled in the art to use the 
Staphylococcus aureus ORFs in a manner similar to the known type of sequences 
for which the identification is made; for example, to ferment a particular sugar 
source or to produce a particular metabolite. A variety of reviews illustrative of 
this aspect of the invention are available, including the following reviews on the 
industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND 
BIOTECHNOLOGY HANDBOOK, 2nd Ed., Macmillan Publications, Ltd. NY 
(1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et al, Eds., 
Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of 
exemplary uses that illustrate this and similar aspects of the present invention are 
discussed below. 

[0167] 1. Biosynthetic Enzymes 

[0168] Open reading frames encoding proteins involved in mediating the catalytic 
reactions involved in intermediary and macromolecular metabolism, the 
biosynthesis of small molecules, cellular processes and other functions includes 
enzymes involved in the degradation of the intermediary products of metabolism, 
enzymes involved in central intermediary metabolism, enzymes involved in 
respiration, both aerobic and anaerobic, enzymes involved in fermentation, 
enzymes involved in ATP proton motor force conversion, enzymes involved in 
broad regulatory function, enzymes involved in amino acid synthesis, enzymes 
involved in nucleotide synthesis, enzymes involved in cofactor and vitamin 
synthesis, can be used for industrial biosynthesis. 

[0169] The various metabolic pathways present in Staphylococcus aureus can be 
identified based on absolute nutritional requirements as well as by examining the 
various enzymes identified in Table 1-3 and SEQ ID NOS: 1-5,1 91. 
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[0170] Of particular interest are polypeptides involved in the degradation of 
intermediary metabolites as well as non-macromolecular metabolism. Such 
enzymes include amylases, glucose oxidases, and catalase. 
[0171] Proteolytic enzymes are another class of commercially important 
enzymes. Proteolytic enzymes find use in a number of industrial processes 
including the processing of flax and other vegetable fibers, in the extraction, 
clarification and depectinization of fruit juices, in the extraction of vegetables' oil 
and in the maceration of fruits and vegetables to give unicellular fruits. A 
detailed review of the proteolytic enzymes used in the food industry is provided 
in Rombouts et al., Symbiosis 21: 79 (1986) and Voragen et ah in 
BIOCATALYSTS IN AGRICULTURAL BIOTECHNOLOGY, Whitaker et aL, 
Eds., American Chemical Society Symposium Series 389 : 93 (1989) . 
[0172] The metabolism of sugars is an important aspect of the primary 
metabolism of Staphylococcus aureus. Enzymes involved in the degradation of 
sugars, such as, particularly, glucose, galactose, fructose and xylose, can be used 
in industrial fermentation. Some of the important sugar transforming enzymes, 
from a commercial viewpoint, include sugar isomerases such as glucose 
isomerase. Other metabolic enzymes have found commercial use such as glucose 
oxidases which produces ketogulonic acid (KG A). KG A is an intermediate in the 
commercial production of ascorbic acid using the Reichstein f s procedure, as 
described in Krueger et aL, Biotechnology 6(A), Rhine et aL, Eds., Verlag Press, 
Weinheim, Germany (1984). 

[0173] Glucose oxidase (GOD) is commercially available and has been used in 
purified form as well as in an immobilized form for the deoxygenation of beer. 
See, for instance, Hartmeir et aL, Biotechnology Letters 1:21 (1979). The most 
important application of GOD is the industrial scale fermentation of gluconic 
acid. Market for gluconic acids which are used in the detergent, textile, leather, 
photographic, pharmaceutical, food, feed and concrete industry, as described, for 
example, in Bigelis et aL, beginning on page 357 in GENE MANIPULATIONS 
AND FUNGI; Benett et aL, Eds., Academic Press, New York (1985). In addition 
to industrial applications, GOD has found applications in medicine for 
quantitative determination of glucose in body fluids recently in biotechnology for 
analyzing syrups from starch and cellulose hydrosylates. This application is 
described in Owusu et aL, Biochem. et Biophysica. Acta. 872: 83 (1986), for 
instance. 



36 



[0174] The main sweetener used in the world today is sugar which comes from 
sugar beets and sugar cane. In the field of industrial enzymes, the glucose 
isomerase process shows the largest expansion in the market today. Initially, 
soluble enzymes were used and later immobilized enzymes were developed 
(Krueger et al, Biotechnology, The Textbook of Industrial Microbiology, Sinauer 
Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of 
glucose- produced high fructose syrups is by far the largest industrial business 
using immobilized enzymes. A review of the industrial use of these enzymes is 
provided by Jorgensen, Starch 40:307 (1988). 

[0175] Proteinases, such as alkaline serine proteinases, are used as detergent 
additives and thus represent one of the largest volumes of microbial enzymes 
used in the industrial sector. Because of their industrial importance, there is a 
large body of published and unpublished information regarding the use of these 
enzymes in industrial processes. (See Faultman et al. 9 Acid Proteases Structure 
Function and Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey 
et al. 9 Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner 
et ah , Report Industrial Enzymes by 1 990, Hel Hepner & Associates, London 
(1986)). 

[0176] Another class of commercially usable proteins of the present invention are 
the microbial lipases, described by, for instance, Macrae et al. , Philosophical 
Transactions of the Chiral Society of London 310:227 (1985) and Poserke, 
Journal of the American Oil Chemist Society 61 :1758 (1984). A major use of 
lipases is in the fat and oil industry for the production of neutral glycerides using 
lipase catalyzed inter-esterification of readily available triglycerides. Application 
of lipases include the use as a detergent additive to facilitate the removal of fats 
from fabrics in the course of the washing procedures. 

[0177] The use of enzymes, and in particular microbial enzymes, as catalyst for 
key steps in the synthesis of complex organic molecules is gaining popularity at a 
great rate. One area of great interest is the preparation of chiral intermediates. 
Preparation of chiral intermediates is of interest to a wide range of synthetic 
chemists particularly those scientists involved with the preparation of new 
pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al., 
Recent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC 
Press, Boca Raton, Florida (1990)). The following reactions catalyzed by 
enzymes are of interest to organic chemists:hydrolysis of carboxylic acid esters, 
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phosphate esters, amides and nitriles, esterification reactions, trans-esterification 
reactions, synthesis of amides, reduction of alkanones and oxoalkanates, 
oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, 
and carbon bond forming reactions such as the aldol reaction. 
[0178] When considering the use of an enzyme encoded by one of the ORJFs of 
the present invention for biotransformation and organic synthesis it is sometimes 
necessary to consider the respective advantages and disadvantages of using a 
microorganism as opposed to an isolated enzyme. Pros and cons of using a whole 
cell system on the one hand or an isolated partially purified enzyme on the other 
hand, has been described in detail by Bud et al. 9 Chemistry in Britain (1987), p. 
127. 

[0179] Amino transferases, enzymes involved in the biosynthesis and metabolism 
of amino acids, are useful in the catalytic production of amino acids. The 
advantages of using microbial based enzyme systems is that the amino transferase 
enzymes catalyze the stereo- selective synthesis of only L-amino acids and 
generally possess uniformly high catalytic rates. A description of the use of 
amino transferases for amino acid production is provided by Roselle-David, 
Methods of Enzymology 136:479 (1987). 

[0180] Another category of useful proteins encoded by the ORFs of the present 
invention include enzymes involved in nucleic acid synthesis, repair, and 
recombination. A variety of commercially important enzymes have previously 
been isolated from members of Staphylococcus aureus. These include Sau3 A 
and Sau96I. 

[0181] 2. Generation of Antibodies 

[0182] As described here, the proteins of the present invention, as well as 
homologs thereof, can be used in a variety procedures and methods known in the 
art which are currently applied to other proteins. The proteins of the present 
invention can further be used to generate an antibody which selectively binds the 
protein. Such antibodies can be either monoclonal or polyclonal antibodies, as 
well fragments of these antibodies, and humanized forms. 

[0183] The invention further provides antibodies which selectively bind to one of 
the proteins of the present invention and hybridomas which produce these 
antibodies. A hybridoma is an immortalized cell line which is capable of 
secreting a specific monoclonal antibody. 
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[0184] In general, techniques for preparing polyclonal and monoclonal antibodies 
as well as hybridomas capable of producing the desired antibody are well known 
in the art (Campbell, A. M., MONOCLONAL ANTIBODY TECHNOLOGY: 
LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULAR 
BIOLOGY, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. 
Groth et al. 9 J. Immunol. Methods 35: 1-21 (1980), Kohler and Milstein, Nature 
256 : 495-497 (1975)), the trioma technique, the human B- cell hybridoma 
technique (Kozbor et al., Immunology Today _4: 72 (1983), pgs. 77-96 of Cole et 
al, in MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. 
Liss, Inc. (1985)). 

[0185] Any animal (mouse, rabbit, etc. ) which is known to produce antibodies 
can be immunized with the pseudogene polypeptide. Methods for immunization 
are well known in the art. Such methods include subcutaneous or interperitoneal 
injection of the polypeptide. One skilled in the art will recognize that the amount 
of the protein encoded by the ORF of the present invention used for 
immunization will vary based on the animal which is immunized, the antigenicity 
of the peptide and the site of inj ection. 

[0186] The protein which is used as an immunogen may be modified or 
administered in an adjuvant in order to increase the protein's antigenicity. 
Methods of increasing the antigenicity of a protein are well known in the art and 
include, but are not limited to coupling the antigen with a heterologous protein 
(such as globulin or galactosidase) or through the inclusion of an adjuvant during 
immunization. 

[0187] For monoclonal antibodies, spleen cells from the immunized animals are 
removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and 
allowed to become monoclonal antibody producing hybridoma cells. 
[0188] Any one of a number of methods well known in the art can be used to 
identify the hybridoma cell which produces an antibody with the desired 
characteristics. These include screening the hybridomas with an ELISA assay, 
western blot analysis, or radioimmunoassay (Lutz et al. , Exp. Cell Res. 175 : 1 09- 
124(1988)). 

[0189] Hybridomas secreting the desired antibodies are cloned and the class and 
subclass is determined using procedures known in the art (Campbell, A. M., 
Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
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Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1984)). 

[0190] Techniques described for the production of single chain antibodies (U. S. 
Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of 
the present invention. 

[0191] For polyclonal antibodies, antibody containing antisera is isolated from 
the immunized animal and is screened for the presence of antibodies with the 
desired specificity using one of the above-described procedures. 
[0192] The present invention further provides the above- described antibodies in 
detectably labelled form. Antibodies can be detectably labelled through the use 
of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels 
(such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels 
(such as FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for 
accomplishing such labelling are well-known in the art, for example see 
Sternberger et al, J. Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et al, 
Meth. Enzym. 62:308 (1979); Engval, E. et al, Immunol. 109:129 (1972); 
Goding, J. W. J. Immunol. Meth. 13:215 (1976)). 

[0193] The labeled antibodies of the present invention can be used for in vitro, in 
vivo, and in situ assays to identify cells or tissues in which a fragment of the 
Staphylococcus aureus genome is expressed. 

[0194] The present invention further provides the above-described antibodies 
immobilized on a solid support. Examples of such solid supports include plastics 
such as polycarbonate, complex carbohydrates such as agarose and sepharose, 
acrylic resins and such as poly aery lamide and latex beads. Techniques for 
coupling antibodies to such solid supports are well known in the art (Weir, D. M. 
et al, "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al, Meth. 
Enzym. 34 Academic Press, N. Y. (1974)). The immobilized antibodies of the 
present invention can be used for in vitro, in vivo, and in situ assays as well as for 
immunoaffinity purification of the proteins of the present invention. 

[0195] 3. Diagnostic Assays and Kits 

[0196] The present invention further provides methods to identify the expression 
of one of the ORFs of the present invention, or homolog thereof, in a test sample, 
using one of the DFs,antigens or antibodies of the present invention. 
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[0197] In detail, such methods comprise incubating a test sample with one or 
more of the antibodies, or one or more of the DFs, or one or more antigens of the 
present invention and assaying for binding of the DFs, antigens or antibodies to 
components within the test sample. 

[0198] Conditions for incubating a DF, antigen or antibody with a test sample 
vary. Incubation conditions depend on the format employed in the assay, the 
detection methods employed, and the type and nature of the DF or antibody used 
in the assay. One skilled in the art will recognize that any one of the commonly 
available hybridization, amplification or immunological assay formats can readily 
be adapted to employ the Dfs, antigens or antibodies of the present invention. 
Examples of such assays can be found in Chard, T., An Introduction to 
Radioimmunoassay and Related Techniques, Elsevier Science Publishers, 
Amsterdam, The Netherlands (1986); Bullock, G. R. et al. 9 Techniques in 
Immunocytochemistry, Academic Press, Orlando, FL Vol. 1 (1982), Vol. 2 
(1983), Vol. 3 (1985); Tijssen, P., Practice and Theory of Enzyme Immunoassays: 
Laboratory Techniques in Biochemistry; PCT publication W095/32291, and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1985), all of which are hereby incorporated herein by reference. 
[0199] The test samples of the present invention include cells, protein or 
membrane extracts of cells, or biological fluids such as sputum, blood, serum, 
plasma, or urine. The test sample used in the above-described method will vary 
based on the assay format, nature of the detection method and the tissues, cells or 
extracts used as the sample to be assayed. Methods for preparing protein extracts 
or membrane extracts of cells are well known in the art and can be readily be 
adapted in order to obtain a sample which is compatible with the system utilized. 
[0200] In another embodiment of the present invention, kits are provided which 
contain the necessary reagents to carry out the assays of the present invention. 
[0201] Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises:(a) a first container 
comprising one of the Dfs, antigens or antibodies of the present invention; and (b) 
one or more other containers comprising one or more of the following: wash 
reagents, reagents capable of detecting presence of a bound DF, antigen or 
antibody. 

[0202] In detail, a compartmentalized kit includes any kit in which reagents are 
contained in separate containers. Such containers include small glass containers, 
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plastic containers or strips of plastic or paper. Such containers allows one to 
efficiently transfer reagents from one compartment to another compartment such 
that the samples and reagents are not cross-contaminated, and the agents or 
solutions of each container can be added in a quantitative fashion from one 
compartment to another. Such containers will include a container which will 
accept the test sample, a container which contains the antibodies used in the 
assay, containers which contain wash reagents (such as phosphate buffered saline, 
Tris-buffers, etc. ), and containers which contain the reagents used to detect the 
bound antibody, antigen or DF. 

[0203] Types of detection reagents include labelled nucleic acid probes, labelled 
secondary antibodies, or in the alternative, if the primary antibody is labelled, the 
enzymatic, or antibody binding reagents which are capable of reacting with the 
labelled antibody. One skilled in the art will readily recognize that the disclosed 
Dfs, antigens and antibodies of the present invention can be readily incorporated 
into one of the established kit formats which are well known in the art. 

[0204] 4. Screening Assay for Binding Agents 

[0205] Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents which bind to a 
protein encoded by one of the ORFs of the present invention or to one of the 
fragments and the Staphylococcus aureus fragment and contigs herein described. 
[0206] In general, such methods comprise steps of: 

(a) contacting an agent with an isolated protein encoded by one of the 
ORFs of the present invention, or an isolated fragment of the Staphylococcus 
aureus genome; and 

(b) determining whether the agent binds to said protein or said fragment. 
[0207] The agents screened in the above assay can be, but are not limited to, 
peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The 
agents can be selected and screened at random or rationally selected or designed 
using protein modeling techniques. 

[0208] For random screening, agents such as peptides, carbohydrates, 
pharmaceutical agents and the like are selected at random and are assayed for 
their ability to bind to the protein encoded by the ORF of the present invention. 
[0209] Alternatively, agents may be rationally selected or designed. As used 
herein, an agent is said to be "rationally selected or designed" when the agent is 
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chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate 
peptides, pharmaceutical agents and the like capable of binding to a specific 
peptide sequence in order to generate rationally designed antipeptide peptides, for 
example see Hurby et al. 9 Application of Synthetic Peptides: Antisense Peptides," 
In Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, 
and Kaspczak et al, Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or 
the like. 

[0210] In addition to the foregoing, one class of agents of the present invention, 
as broadly described, can be used to control gene expression through binding to 
one of the ORFs or EMFs of the present invention. As described above, such 
agents can be randomly screened or rationally designed/selected. Targeting the 
ORF or EMF allows a skilled artisan to design sequence specific or element 
specific agents, modulating the expression of either a single ORF or multiple 
ORFs which rely on the same EMF for expression control. 
[0211] One class of DNA binding agents are agents which contain base residues 
which hybridize or form a triple helix by binding to DNA or RNA. Such agents 
can be based on the classic phosphodiester, ribonucleic acid backbone, or can be 
a variety of sulfhydryl or polymeric derivatives which have base attachment 
capacity. 

[0212] Agents suitable for use in these methods usually contain 20 to 40 bases 
and are designed to be complementary to a region of the gene involved in 
transcription (triple helix - see Lee et al y Nucl. Acids Res. 6:3073 (1979); 
Cooneyef a/., Science 241:456 (1988); and Dervan et ah , Science 251: 1360 
(1991)) or to the mRNA itself (antisense - Okano, J. Neurochem. 56:560 (1991); 
Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of 
RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 
sequences of the present invention can be used to design antisense and triple 
helix-forming oligonucleotides, and other DNA binding agents. 

[0213] 5. Pharmaceutical Compositions and Vaccines 
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[0214] The present invention further provides pharmaceutical agents which can 
be used to modulate the growth or pathogenicity of Staphylococcus aureus, or 
another related organism, in vivo or in vitro. As used herein, a "pharmaceutical 
agent" is defined as a composition of matter which can be formulated using 
known techniques to provide a pharmaceutical compositions. As used herein, the 
"pharmaceutical agents of the present invention" refers the pharmaceutical agents 
which are derived from the proteins encoded by the ORFs of the present invention 
or are agents which are identified using the herein described assays. 
[0215] As used herein, a pharmaceutical agent is said to "modulate the growth or 
pathogenicity of Staphylococcus aureus or a related organism, in vivo or in vitro" 
when the agent reduces the rate of growth, rate of division, or viability of the 
organism in question. The pharmaceutical agents of the present invention can 
modulate the growth or pathogenicity of an organism in many fashions, although 
an understanding of the underlying mechanism of action is not needed to practice 
the use of the pharmaceutical agents of the present invention. Some agents will 
modulate the growth or pathogenicity by binding to an important protein thus 
blocking the biological activity of the protein, while other agents may bind to a 
component of the outer surface of the organism blocking attachment or rendering 
the organism more prone to act the bodies nature immune system. Alternatively, 
the agent may comprise a protein encoded by one of the ORFs of the present 
invention and serve as a vaccine. The development and use of vaccines derived 
from membrane associated polypeptides are well known in the art. The inventors 
have identified particularly preferred immunogenic Staphylococcus aureus 
polypeptides for use as vaccines. Such immunogenic polypeptides are described 
above and summarized in Table 4, below. 

[0216] As used herein, a "related organism" is a broad term which refers to any 
organism whose growth or pathogenicity can be modulated by one of the 
pharmaceutical agents of the present invention. In general, such an organism will 
contain a homolog of the protein which is the target of the pharmaceutical agent 
or the protein used as a vaccine. As such, related organisms do not need to be 
bacterial but may be fungal or viral pathogens. 

[0217] The pharmaceutical agents and compositions of the present invention may 
be administered in a convenient manner, such as by the oral, topical, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. 
The pharmaceutical compositions are administered in an amount which is 
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effective for treating and/or prophylaxis of the specific indication. In general, 
they are administered in an amount of at least about 1 mg/kg body weight and in 
most cases they will be administered in an amount not in excess of about 1 g/kg 
body weight per day. In most cases, the dosage is from about 0.1 mg/kg to about 
1 0 g/kg body weight daily, taking into account the routes of administration, 
symptoms, etc. 

[0218] The agents of the present invention can be used in native form or can be 
modified to form a chemical derivative. As used herein, a molecule is said to be 
a "chemical derivative" of another molecule when it contains additional chemical 
moieties not normally a part of the molecule. Such moieties may improve the 
molecule's solubility, absorption, biological half life, etc. The moieties may 
alternatively decrease the toxicity of the molecule, eliminate or attenuate any 
undesirable side effect of the molecule, etc. Moieties capable of mediating such 
effects are disclosed in, among other sources, REMINGTON'S 
PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein. 
[0219] For example, such moieties may change an immunological character of 
the functional derivative, such as affinity for a given antibody. Such changes in 
immunomodulation activity are measured by the appropriate assay, such as a 
competitive type immunoassay. Modifications of such protein properties as 
redox or thermal stability, biological half-life, hydrophobicity, susceptibility to 
proteolytic degradation or the tendency to aggregate with carriers or into 
multimers also may be effected in this way and can be assayed by methods well 
known to the skilled artisan. 

[0220] The therapeutic effects of the agents of the present invention may be 
obtained by providing the agent to a patient by any suitable means (e.g., 
inhalation, intravenously, intramuscularly, subcutaneously, enterally, or 
parenterally). It is preferred to administer the agent of the present invention so as 
to achieve an effective concentration within the blood or tissue in which the 
growth of the organism is to be controlled. To achieve an effective blood 
concentration, the preferred method is to administer the agent by injection. The 
administration may be by continuous infusion, or by single or multiple injections. 
[0221] In providing a patient with one of the agents of the present invention, the 
dosage of the administered agent will vary depending upon such factors as the 
patient's age, weight, height, sex, general medical condition, previous medical 
history, etc. In general, it is desirable to provide the recipient with a dosage of 
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agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of 
patient), although a lower or higher dosage may be administered. The ' 
therapeutically effective dose can be lowered by using combinations of the agents 
of the present invention or another agent. 

[0222] As used herein, two or more compounds or agents are said to be 
administered "in combination" with each other when either (1) the physiological 
effects of each compound, or (2) the serum concentrations of each compound can 
be measured at the same time. The composition of the present invention can be 
administered concurrently with, prior to, or following the administration of the 
other agent. 

[0223] The agents of the present invention are intended to be provided to 
recipient subjects in an amount sufficient to decrease the rate of growth (as 
defined above) of the target organism. 

[0224] The administration of the agent(s) of the invention may be for either a 
"prophylactic" or "therapeutic" purpose. When provided prophylactically, the 
agent(s) are provided in advance of any symptoms indicative of the organisms 
growth. The prophylactic administration of the agent(s) serves to prevent, 
attenuate, or decrease the rate of onset of any subsequent infection. When 
provided therapeutically, the agent(s) are provided at (or shortly after) the onset of 
an indication of infection. The therapeutic administration of the compound(s) 
serves to attenuate the pathological symptoms of the infection and to increase the 
rate of recovery. 

[0225] The agents of the present invention are administered to a subject, such as a 
mammal, or a patient, in a pharmaceutically acceptable form and in a 
therapeutically effective concentration. A composition is said to be 
"pharmacologically acceptable" if its administration can be tolerated by a 
recipient patient. Such an agent is said to be administered in a "therapeutically 
effective amount" if the amount administered is physiologically significant. An 
agent is physiologically significant if its presence results in a detectable change in 
the physiology of a recipient patient. 

[0226] The agents of the present invention can be formulated according to known 
methods to prepare pharmaceutically useful compositions, whereby these 
materials, or their functional derivatives, are combined in admixture with a 
pharmaceutically acceptable carrier vehicle. Suitable vehicles and their 
formulation, inclusive of other human proteins, e.g., human serum albumin, are 
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described, for example, in REMINGTON'S PHARMACEUTICAL SCIENCES, 
16 th Ed., Osol, A., Ed., Mack Publishing, Easton PA (1980). In order to form a 
pharmaceutical^ acceptable composition suitable for effective administration, 
such compositions will contain an effective amount of one or more of the agents 
of the present invention, together with a suitable amount of carrier vehicle. 
[0227] Additional pharmaceutical methods may be employed to control the 
duration of action. Control release preparations may be achieved through the use 
of polymers to complex or absorb one or more of the agents of the present 
invention. The controlled delivery may be effectuated by a variety of well known 
techniques, including formulation with macromolecules such as, for example, 
polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, 
methylcellulose, carboxymethylcellulose, or protamine, sulfate, adjusting the 
concentration of the macromolecules and the agent in the formulation, and by 
appropriate use of methods of incorporation, which can be manipulated to 
effectuate a desired time course of release. Another possible method to control 
the duration of action by controlled release preparations is to incorporate agents 
of the present invention into particles of a polymeric material such as polyesters, 
polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. 
Alternatively, instead of incorporating these agents into polymeric particles, it is 
possible to entrap these materials in microcapsules prepared, for example, by 
coacervation techniques or by interfacial polymerization with, for example, 
hydroxymethylcellulose or gelatine-microcapsules and poly(methylmethacylate) 
microcapsules, respectively, or in colloidal drug delivery systems, for example, 
liposomes, albumin microspheres, microemulsions, nanoparticles, and 
nanocapsules or in macroemulsions. Such techniques are disclosed in 
REMINGTON'S PHARMACEUTICAL SCIENCES (1980). 
[0228] The invention further provides a pharmaceutical pack or kit comprising 
one or more containers filled with one or more of the ingredients of the 
pharmaceutical compositions of the invention. Associated with such container(s) 
can be a notice in the form prescribed by a governmental agency regulating the 
manufacture, use or sale of pharmaceuticals or biological products, which notice 
reflects approval by the agency of manufacture, use or sale for human 
administration. 

[0229] In addition, the agents of the present invention may be employed in 
conjunction with other therapeutic compounds. 
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[0230] 6. Shot-Gun Approach to Megabase DNA Sequencing 

[0231] The present invention further demonstrates that a large sequence can be 

sequenced using a random shotgun approach. This procedure, described in detail 

in the examples that follow, has eliminated the up front cost of isolating and 

ordering overlapping or contiguous subclones prior to the start of the sequencing 

protocols. 

[0232] Certain aspects of the present invention are described in greater detail in 
the examples that follow. The examples are provided by way of illustration. 
Other aspects and embodiments of the present invention are contemplated by the 
inventors, as will be clear to those of skill in the art from reading the present 
disclosure. 
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[0233] ILLUSTRATIVE EXAMPLES 



[0234] LIBRARIES AND SEQUENCING 

[0235] 1. Shotgun Sequencing Probability Analysis 

[0236] The overall strategy for a shotgun approach to whole genome sequencing 
follows from the Lander and Waterman (Landerman and Waterman, Genomics 2: 
231 (1988)) application of the equation for the Poisson distribution. According to 
this treatment, the probability, Po, that any given base in a sequence of size L, in 
nucleotides, is not sequenced after a certain amount, n, in nucleotides, of random 
sequence has been determined can be calculated by the equation Po = e' m , where 
m is L/n, the fold coverage." For instance, for a genome of 2.8 Mb, m=l when 
2.8 Mb of sequence has been randomly generated (IX coverage). At that point, 
Po = e" 1 = 0.37. The probability that any given base has not been sequenced is the 
same as the probability that any region of the whole sequence L has not been 
determined and, therefore, is equivilent to the fraction of the whole sequence that 
has yet to be determined. Thus, at one-fold coverage, approximately 37% of a 
polynucleotide of size L, in nucleotides has not been sequenced. When 14 Mb of 
sequence has been generated, coverage is 5X for a .2.8 Mb and the unsequenced 
fraction drops to .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be 
attained by sequencing approximately 1 7,000 random clones from both insert 
ends with an average sequence read length of 410 bp. 

[0237] Similarly, the total gap length, G, is determined by the equation G = Le" m , 
and the average gap size, g, follows the equation, g = L/n. Thus, 5X coverage 
leaves about 240 gaps averaging about 82 bp in size in a sequence of a 
polynucleotide 2.8 Mb long. 

[0238] The treatment above is essentially that of Lander and Waterman, 
Genomics 2: 231 (1988). 

[0239] Random Library Construction 

[0240] In order to approximate the random model described above during actual 
sequencing, a nearly ideal library of cloned genomic fragments is required. The 
following library construction procedure was developed to achieve this end. 
[0241] Staphylococcus aureus DNA was prepared by phenol extraction. A 
mixture containing 600 ug DNA in 3.3 ml of 300 mM sodium acetate, 10 mM 
Tris-HCl, 1 mM Na-EDTA, 30% glycerol was sonicated for 1 min. at 0°C in a 
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Branson Model 450 Sonicator at the lowest energy setting using a 3 mm probe. 
The sonicated DNA was ethanol precipitated and redissolved in 500 ul TE buffer. 
[0242] To create blunt-ends, a 1 00 ul aliquot of the resuspended DNA was 
digested with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 
30°C in 200 ul BAL31 buffer . The digested DNA was phenol-extracted, 
ethanol-precipitated, redissolved in 1 00 ul TE buffer, and then size-fractionated 
by electrophoresis through a 1.0% low melting temperature agarose gel. The 
section containing DNA fragments 1.6-2.0 kb in size was excised from the gel, 
and the LGT agarose was melted and the resulting solution was extracted with 
phenol to separate the agarose from the DNA. DNA was ethanol precipitated and 
redissolved in 20 ul of TE buffer for ligation to vector. 

[0243] A two-step ligation procedure was used to produce a plasmid library with 
97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) 
contained 2 ug of DNA fragments, 2 ug pUC18 DNA (Pharmacia) cut with Smal 
and dephosphorylated with bacterial alkaline phosphatase, and 1 0 units of T4 
ligase (GIBCO/BRL) and was incubated at 14°C for 4 hr. The ligation mixture 
then was phenol extracted and ethanol precipitated, and the precipitated DNA was 
dissolved in 20 ul TE buffer and electrophoresed on a 1 .0% low melting agarose 
gel. Discrete bands in a ladder were visualized by ethidium bromide-staining and 
UV illumination and identified by size as insert (i), vector (v), v+i, v+2i, v+3i, 
etc. The portion of the gel containing v+i DNA was excised and the v+i DNA 
was recovered and resuspended into 20 ul TE. The v+i DNA then was blunt- 
ended by T4 polymerase treatment for 5 min. at 37° C in a reaction mixture (50 
ul) containing the v+i linears, 500 uM each of the 4 dNTPs, and 9 units of T4 
polymerase (New England BioLabs), under recommended buffer conditions. 
After phenol extraction and ethanol precipitation the repaired v+i linears were 
dissolved in 20 ul TE. The final ligation to produce circles was carried out in a 
50 ul reaction containing 5 ul of v+i linears and 5 units of T4 ligase at 14°C 
overnight. After 10 min. at 70°C the following day, the reaction mixture was 
stored at -20°C. 

[0244] This two-stage procedure resulted in a molecularly random collection of 
single-insert plasmid recombinants with minimal contamination from double- 
insert chimeras (<1%) or free vector (<3%). 

[0245] Since deviation from randomness can arise from propagation the DNA in 
the host, E.coli host cells deficient in all recombination and restriction functions 



50 



(A. Greener, Strategies 3 (1):5 (1990)) were used to prevent rearrangements, 
deletions, and loss of clones by restriction. Furthermore, transformed cells were 
plated directly on antibiotic diffusion plates to avoid the usual broth recovery 
phase which allows multiplication and selection of the most rapidly growing 
cells. 

[0246] Plating was carried out as follows. A 100 ul aliquot of Epicurian Coli 
SURE II Supercompetent Cells (Stratagene 200152) was thawed on ice and 
transferred to a chilled Falcon 2059 tube on ice. A 1 .7 ul aliquot of 1 .42 M beta- 
mercaptoethanol was added to the aliquot of cells to a final concentration of 25 
mM. Cells were incubated on ice for 10 min. A 1 ul aliquot of the final ligation 
was added to the cells and incubated on ice for 30 min. The cells were heat 
pulsed for 30 sec. at 42° C and placed back on ice for 2 min. The outgrowth 
period in liquid culture was eliminated from this protocol in order to minimize 
the preferential growth of any given transformed cell. Instead the transformation 
mixture was plated directly on a nutrient rich SOB plate containing a 5 ml bottom 
layer of SOB agar (5% SOB agar: 20 g tryptone, 5 g yeast extract, 0.5 g NaCl, 
1 .5% Difco Agar per liter of media). The 5 ml bottom layer is supplemented with 
0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB 
agar is supplemented with 1 ml X-Gal (2%), 1 ml MgCh (1 M), and 1 ml 
MgSOVlOO ml SOB agar. The 1 5 ml top layer was poured just prior to plating. 
Our titer was approximately 100 colonies/ 10 ul aliquot of transformation. 
[0247] All colonies were picked for template preparation regardless of size. 
Thus, only clones lost due to "poison" DNA or deleterious gene products would 
be deleted from the library, resulting in a slight increase in gap number over that 
expected. 

[0248] Random DNA Sequencing 

[0249] High quality double stranded DNA plasmid templates were prepared using 
an alkaline lysis method developed in collaboration with 5Prime — > 3 Prime Inc. 
(Boulder, CO). Plasmid preparation was performed in a 96-well format for all 
stages of DNA preparation from bacterial growth through final DNA purification. 
Average template concentration was determined by running 25% of the samples 
on an agarose gel. DNA concentrations were not adjusted. 
[0250] Templates were also prepared from a Staphylococcus aureus lambda 
genomic library. An unamplified library was constructed in Lambda DASH II 
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vector (Stratagene). Staphylococcus aureus DNA (> 100 kb) was partially 
digested in a reaction mixture (200 ul) containing 50 ug DNA, IX Sau3AI buffer, 
20 units Sau3AI for 6 min. at 23 C. The digested DNA was phenol-extracted and 
centrifuges over a 1 0- 40% sucroce gradient. Fractions containing genomic DNA 
of 1 5-25 kb were recovered by precipitation . One ul of fragments was used with 
1 ul of DASHII vector (Stratagene) in the recommended ligation reaction. One ul 
of the ligation mixture was used per packaging reaction following the 
recommended protocol with the Gigapack II XL Packaging Extract Phage were 
plated directly without amplification from the packaging mixture (after dilution 
with 500 ul of recommended SM buffer and chloroform treatment). Yield was 
about 2.5x1 0 9 pfu/ul. 

[0251] An amplified library was prepared from the primary packaging mixture 
according to the manufactureer's protocol. The amplified library is stored frozen 
in 7% dimethylsulfoxide. The phage titer is approximately lxl 0 9 pfu/ml. 
[0252] Mini-liquid lysates (0.1 ul) are prepared from randomly selected plaques 
and template is prepared by long range PCR. Samples are PCR amplified using 
modified T3 and T7 primers, and Elongase Supermix (LTI). 
[0253] Sequencing reactions are carried out on plasmid templates using a 
combination of two workstations (BIOMEK 1000 and Hamilton Microlab 2200) 
and the Perkin-Elmer 9600 thermocycler with Applied Biosystems PRISM Ready 
Reaction Dye Primer Cycle Sequencing Kits for the Ml 3 forward (Ml 3 -21) and 
the Ml 3 reverse (M13RP1) primers. Dye terminator sequencing reactions are 
carried out on the lambda templates on a Perkin-Elmer 9600 Thermocycler using 
the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. 
Modified T7 and T3 primers are used to sequence the ends of the inserts from the 
Lambda DASH II library. Sequencing reactions are on a combination of AB 373 
DNA Sequencers and ABI 377 DNA sequencers. All of the dye terminator 
sequencing reactions are analyzed using the 2X 9 hour module on the AB 377. 
Dye primer reactions are analyzed on a combination of ABI 373 and ABI 377 
DNA sequencers. The overall sequencing success rate very approximately is 
about 85% for Ml 3-21 and M13RP1 sequences and 65% for dye-terminator 
reactions. The average usable read length is 485 bp for Ml 3-21 sequences, 445bp 
for M13RP1 sequences, and 375 bp for dye-terminator reactions. 

[0254] Protocol for Automated Cycle Sequencing 
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[0255] The sequencing was carried out using Hamilton Microstation 2200, Perkin 
Elmer 9600 thermocyclers, ABI 373 and ABI 377 Automated DNA Sequencers. 
The Hamilton combines pre-aliquoted templates and reaction mixes consisting of 
deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, 
fluorescently-labelled sequencing primers, and reaction buffer. Reaction mixes 
and templates were combined in the wells of a 96-well thermocycling plate and 
transferred to the Perkin Elmer 9600 thermocycler. Thirty consecutive cycles of 
linear amplification (i.e.., one primer synthesis) steps were performed including 
denaturation, annealing of primer and template, and extension; i.e., DNA 
synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents 
evaporation without the need for ah oil overlay. 

[0256] Two sequencing protocols were used: one for dye-labelled primers and a 
second for dye-labelled dideoxy chain terminators. The shotgun sequencing 
involves use of four dye-labelled sequencing primers, one for each of the four 
terminator nucleotide. Each dye-primer was labelled with a different fluorescent 
dye, permitting the four individual reactions to be combined into one lane of the 
373 or 377 DNA Sequencer for electrophoresis, detection, and base-calling. ABI 
currently supplies pre-mixed reaction mixes in bulk packages containing all the 
necessary non-template reagents for sequencing. Sequencing can be done with 
both plasmid and PCR- generated templates with both dye-primers and dye- 
terminators with approximately equal fidelity, although plasmid templates 
generally give longer usable sequences. 

[0257] Thirty-two reactions were loaded per ABI 373 Sequencer each day and 96 
samples can be loaded on an ABI 377 per day. Electrophoresis was run overnight 
(ABI 373) or for 2 1/2 hours (ABI 377) following the manufacturer's protocols. 
Following electrophoresis and fluorescence detection, the ABI 373 or ABI 377 
performs automatic lane tracking and base-calling. The lane-tracking was 
confirmed visually. Each sequence electropherogram (or fluorescence lane trace) 
was inspected visually and assessed for quality. Trailing sequences of low quality 
were removed and the sequence itself was loaded via software to a Sybase 
database (archived daily to 8mm tape). Leading vector poly linker sequence was 
removed automatically by a software program. Average edited lengths of 
sequences from the standard ABI 373 or ABI 377 were around 400 bp and depend 
mostly on the quality of the template used for the sequencing reaction. 
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[0258] INFORMATICS 
[0259] 1. Data Management 

[0260] A number of information management systems for a large-scale 
sequencing lab have been developed. (For review see, for instance, Kerlavage et 
al, Proceedings of the Twenty-Sixth Annual Hawaii International Conference on 
System Sciences, IEEE Computer Society Press, Washington D. C, 585 (1993)) 
The system used to collect and assemble the sequence data was developed using 
the Sybase relational database management system and was designed to automate 
data flow whereever possible and to reduce user error. The database stores and 
a correlates all information collected during the entire operation from template 
preparation to final analysis of the genome. Because the raw output of the ABI 
373 Sequencers was based on a Macintosh platform and the data management 
system chosen was based on a Unix platform, it was necessary to design and 
implement a variety of multi- user, client-server applications which allow the raw 
data as well as analysis results to flow seamlessly into the database with a 
minimum of user effort. 

[0261] Assembly 

[0262] An assembly engine (TIGR Assembler) developed for the rapid and 
accurate assembly of thousands of sequence fragments was enployed to generate 
contigs. The TIGR assembler simultaneously clusters and assembles fragments 
of the genome. In order to obtain the speed necessary to assemble more than 1 0 4 
fragments, the algorithm builds a hash table of 12 bp oligonucleotide 
subsequences to generate a list of potential sequence fragment overlaps. The 
number of potential overlaps for each fragment determines which fragments are 
likely to fall into repetitive elements. Beginning with a single seed sequence 
fragment, TIGR Assembler extends the current contig by attempting to add the 
best matching fragment based on oligonucleotide content. The contig and 
candidate fragment are aligned using a modified version of the Smith- Waterman 
algorithm which provides for optimal gapped alignments (Waterman, M. S., 
Methods in Enzymology 164 : 765 (1988)). The contig is extended by the 
fragment only if strict criteria for the quality of the match are met. The match 
criteria include the minimum length of overlap, the maximum length of an 
unmatched end, and the minimum percentage match. These criteria are 
automatically lowered by the algorithm in regions of minimal coverage and raised 
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in regions with a possible repetitive element. The number of potential overlaps 
for each fragment determines which fragments are likely to fall into repetitive 
elements. Fragments representing the boundaries of repetitive elements and 
potentially chimeric fragments are often rejected based on partial mismatches at 
the ends of alignments and excluded from the current contig. TIGR Assembler is 
designed to take advantage of clone size information coupled with sequencing 
from both ends of each template. It enforces the constraint that sequence 
fragments from two ends of the same template point toward one another in the 
contig and are located within a certain ranged of base pairs (definable for each 
clone based on the known clone size range for a given library). 

[0263] Identifying Genes 

[0264] The predicted coding regions of the Staphylococcus aureus genome were 
initially defined with the program zorf, which finds ORFs of a minimum length. 
The predicted coding region sequences were used in searches against a database 
of all Staphylococcus aureus nucleotide sequences from GenBank (release 92.0), 
using the BLASTN search method to identify overlaps of 50 or more nucleotides 
with at least a 95% identity. Those ORFs with nucleotide sequence matches are 
shown in Table 1 . The ORFs without such matches were translated to protein 
sequences and and compared to a non-redundant database of known proteins 
generated by combining the Swiss-prot, PIR and GenPept databases. ORFs of at 
least 80 amino acids that matched a database protein with BLASTP probability 
less than or equal to 0.01 are shown in Table 2. The table also lists assigned 
functions based on the closest match in the databases. ORFs of at least 120 
amino acids that did not match protein or nucleotide sequences in the databases at 
these levels are shown in Table 3. 



[0265] ILLUSTRATIVE APPLICATIONS 

[0266] 1. Production of an Antibody to a Staphylococcus aureus Yrotem 
[0267] Substantially pure protein or polypeptide is isolated from the transfected 
or transformed cells using any one of the methods known in the art. The protein 
can also be produced in a recombinant prokaryotic expression system, such as E. 
coli, or can by chemically synthesized. Concentration of protein in the final 
preparation is adjusted, for example, by concentration on an Amicon filter device, 
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to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the 
protein can then be prepared as follows. 

[0268] Monoclonal Antibody Production by Hybridoma Fusion 
[0269] Monoclonal antibody to epitopes of any of the peptides identified and 
isolated as described can be prepared from murine hybridomas according to the 
classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or 
modifications of the methods thereof. Briefly, a mouse is repetitively inoculated 
with a few micrograms of the selected protein over a period of a few weeks. The 
mouse is then sacrificed, and the antibody producing cells of the spleen isolated. 
The spleen cells are fused by means of polyethylene glycol with mouse myeloma 
cells, and the excess unfused cells destroyed by growth of the system on selective 
media comprising aminopterin (HAT media). The successfully fused cells are 
diluted and aliquots of the dilution placed in wells of a microtiter plate where 
growth of the culture is continued. Antibody-producing clones are identified by 
detection of antibody in the supernatant fluid of the wells by immunoassay 
procedures, such as ELISA, as originally described by Engvall, E., Meth. 
Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones 
can be expanded and their monoclonal antibody product harvested for use. 
Detailed procedures for monoclonal antibody production are described in Davis, 
L. et ah Basic Methods in Molecular Biology Elsevier, New York. Section 21-2 
(1989). 

[0270] Polyclonal Antibody Production by Immunization 
[0271] Polyclonal antiserum containing antibodies to heterogenous epitopes of a 
single protein can be prepared by immunizing suitable animals with the expressed 
protein described above, which can be unmodified or modified to enhance 
immunogenicity. Effective polyclonal antibody production is affected by many 
factors related both to the antigen and the host species. For example, small 
molecules tend to be less immunogenic than other and may require the use of 
carriers and adjuvant. Also, host animals vary in response to site of inoculations 
and dose, with both inadequate or excessive doses of antigen resulting in low titer 
antisera. Small doses (ng level) of antigenadministered at multiple intradermal 
sites appears to be most reliable. An effective immunization protocol for rabbits 
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can be found in Vaitukaitis, J. et al, J. Clin. Endocrinol. Metab. 33:988-991 
(1971). 

[0272] Booster injections can be given at regular intervals, and antiserum 
harvested when antibody titer thereof, as determined semi-quantitatively, for 
example, by double immunodiffusion in agar against known concentrations of the 
antigen, begins to fall. See, for example, Ouchterlony, O. et aL, Chap. 19 
in:Handbook of Experimental Immunology, Wier, D., ed, Blackwell (1973). 
Plateau concentration of antibody is usually in the range of 0. 1 to 0. 2 mg/ml of 
serum (about 12M). Affinity of the antisera for the antigen is determined by 
preparing competitive binding curves, as described, for example, by Fisher, D., 
Chap. 42 in:Manual of Clinical Immunology, second edition, Rose and Friedman, 
eds., Amer. Soc. For Microbiology, Washington, D. C. (1980). 
[0273] Antibody preparations prepared according to either protocol are useful in 
quantitative immunoassays which determine concentrations of antigen-bearing 
substances in biological samples; they are also used semi- quantitatively or 
qualitatively to identify the presence of antigen in a biological sample. In 
addition, they are useful in various animal models of Staphylococcal disease 
known to those of skill in the art as a means of evaluating the protein used to 
make the antibody as a potential vaccine target or as a means of evaluating the 
antibody as a potential immunothereapeutic reagent. 

[0274] Preparation of PCR Primers and Amplification of DNA 
[0275] Various fragments of the Staphylococcus aureus genome, such as those of 
Tables 1-3 and SEQ ID NOS: 1-5,191 can be used, in accordance with the present 
invention, to prepare PCR primers for a variety of uses. The PCR primers are 
preferably at least 1 5 bases, and more preferably at least 1 8 bases in length. 
When selecting a primer sequence, it is preferred that the primer pairs have 
approximately the same G/C ratio, so that melting temperatures are approximately 
the same. The PCR primers and amplified DNA of this Example find use in the 
Examples that follow. 

[0276] Gene expression from DNA Sequences Corresponding to ORFs 
[0277] A fragment of the Staphylococcus aureus genome provided in Tables 1-3 
is introduced into an expression vector using conventional technology. 
Techniques to transfer cloned sequences into expression vectors that direct 
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protein translation in mammalian, yeast, insect or bacterial expression systems are 
well known in the art. Commercially available vectors and expression systems 
are available from a variety of suppliers including Stratagene (La Jolla, 
California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, 
California). If desired, to enhance expression and facilitate proper protein 
folding, the codon context and codon pairing of the sequence may be optimized 
for the particular expression organism, as explained by Hatfield et aL, U. S. 
Patent No. 5,082,767, incorporated herein by this reference. 
[0278] The following is provided as one exemplary method to generate 
polypeptide(s) from cloned ORFs of the Staphylococcus aureus genome 
fragment. Bacterial ORFs generally lack a poly A addition signal. The addition 
signal sequence can be added to the construct by, for example, splicing out the 
poly A addition sequence from pSG5 (Stratagene) using Bgll and Sail restriction 
endonuclease enzymes and incorporating it into the mammalian expression vector 
pXTl (Stratagene) for use in eukaryotic expression systems. pXTl contains the 
LTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The 
positions of theLTRs in the construct allow efficient stable transfection. The 
vector includes the Herpes Simplex thymidine kinase promoter and the selectable 
neomycin gene. The Staphylococcus aureus DNA is obtained by PCR from the 
bacterial vector using oligonucleotide primers complementary to the 
Staphylococcus aureus DNA and containing restriction endonuclease sequences 
for PstI incorporated into the 5' primer and Bglll at the 5' end of the 
corresponding Staphylococcus aureus DNA 3 * primer, taking care to ensure that 
the Staphylococcus aureus DNA is positioned such that its followed with the poly 
A addition sequence. The purified fragment obtained from the resulting PCR 
reaction is digested with PstI, blunt ended with an exonuclease, digested with 
Bglll, purified and ligated to pXTl, now containing a poly A addition sequence 
and digested Bglll. 

[0279] The ligated product is transfected into mouse NIH 3T3 cells using 
Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions 
outlined in the product specification. Positive transfectants are selected after 
growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). 
The protein is preferably released into the supernatant. However if the protein 
has membrane binding domains, the protein may additionally be retained within 
the cell or expression may be restricted to the cell surface. Since it may be 
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necessary to purify and locate the transfected product, synthetic 15-mer peptides 
synthesized from the predicted Staphylococcus aureus DNA sequence are 
injected into mice to generate antibody to the polypeptide encoded by the 
Staphylococcus aureus DNA. 

[0280] Alternativly and if antibody production is not possible, the Staphylococcus 
aureus DNA sequence is additionally incorporated into eukaryotic expression 
vectors and expressed as, for example, a globin fusion. Antibody to the globin 
moiety then is used to purify the chimeric protein. Corresponding protease 
cleavage sites are engineered between the globin moiety and the polypeptide 
encoded by the Staphylococcus aureus DNA so that the latter may be freed from 
the formed by simple protease digestion. One useful expression vector for 
generating globin chimerics is pSG5 (Stratagene). This vector encodes a rabbit 
globin. Intron II of the rabbit globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases 
the level of expression. These techniques are well known to those skilled in the 
art of molecular biology. Standard methods are published in methods texts such 
as Davis et al , cited elsewhere herein, and many of the methods are available 
from the technical assistance representatives from Stratagene, Life Technologies, 
Inc., or Promega. Polypeptides of the invention also may be produced using in 
vitro translation systems such as in vitro ExpressTM Translation Kit (Stratagene). 
[0281] While the present invention has been described in some detail for 
purposes of clarity and understanding, one skilled in the art will appreciate that 
various changes in form and detail can be made without departing from the true 
scope of the invention. All patents, patent applications and publications referred 
to above are hereby incorporated by reference. 
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